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Abstract. The basic problem of optimal transportation consists in minimiz- 
ing the expected costs E[c(Xi,Jf2)] by varying the joint distribution {Xi,X2) 
where the marginal distributions of the random variables Xi and X2 are fixed. 

Inspired by recent applications in mathematical finance and connections 
with the peacock problem we study this problem under the additional condition 
that {Xi)i—i^2 is a martingale, i.e. E[X2|Xi] = Xi. 

Wc introduce a variational lemma that enables us to derive characteristic 
properties of optimal martingale transport plans for specific cost functions. In 
particular we identify a martingale coupling that resembles the classic mono- 
tone quantile coupling in severals aspects. In analogy with the celebrated 
Theorem of Brenier-Riischendorf the following behavior can be observed; if 
the initial distribution is continuous, then this "monotone martingale" is sup- 
ported by the graphs of two functions Ti,T2 : K — >■ R. 



1.1. Presentation of the martingale transport problem. Wc will denote by 
V the set of probability measures on M having finite first moments. We are given 



moreover that c{x,y) > a{x) + b{y) where a (resp. b) is integrable with respect to 
fi (resp. v). Hence if {X,Y) is a joint law with marginal distributions law X — fi 
and IslwY = ly, the expectation of c{X,Y) > a(X) +b{Y) is well defined, taking its 
value in [E(a(X)) + E{b{Y)), +00]. We will refer to this technical hypothesis as the 
sufficient integrability condition. The basic problem of optimal transport consists 
in the minimization problem 

(1) Minimize E[c{X,Y)] for law{X) ^ fi, \aw{Y) = 1^. 

where the infimum is taken over all joint distributions. We denote the infimum in 
([1]) by C{fx,i/). The joint laws on M x R are usually called transport plans after 
the classical concrete problem of Monge [21]: how can one transport a heap of 
soil distributed according to to a target distribution i^? A transport plan tt 
prescribes that for (x, ?/) G a quantity of mass 7r(da;d?;) is transported from x to 
y. Minimizers of the problem ([T]) are called optimal transport plans. Note that we 
will also use the more probabilistic term coupling for transport plans. Following [26] 
we denote the set of all transport plans by n(/i, i') so that one has the alternative 
definition 



Apart from one notable exception we will only be interested in continuous cost functions. 



1. Introduction 
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Our main interest lies in a martingale version of the transport problem. That is, 
our aim is to minimize ]E[c(X, Y)] over the set of all martingale transport plans 

HmIm, 1^) ^{tt e U{fi, v):ti^ law(X, Y) and ¥.\Y\X] = X}. 

A transport plan tt is equivalently described through its disintegration {'Kx)xev. 
with respect to the initial distribution /i. The probabilistic interpretation is that 
(.T,A) > Hx{A) is the transition kernel of the two-step process (^i)i=i.2 where 
Xi ^ X and X2 = Y, i.e. tTx{A) = ¥{Y e A\X = x). In these terms, tt is an 
element of nj\/(/i, i^), iff J ydTTxiy) — x holds //-a.s. Hence in this paper we study 
the minimization problem 

(2) Minimize E^[c] = jj c{x,y) dn{x,y) for 'k€IIm{i^,i') 

for various costs. Let Cm{i^tv) denote the infimum infjETrC : tt G UMilJ-iv)}. 

Our optimal transport approach permits to distinguish some special couplings 
of IImCMi'^) that are comparable to the monotone (or Hoeffding-Frechet) coupling 
tthf G n(/x, v). Indeed we have developed our martingale transport theory parallel 
to the classical theory and the optimizer of ([2]) will enjoy canonical properties. 
Nevertheless notable differences occur between the theories. An obvious one is 
the fact that T\m{iJ;^) can be empty while Yi{pL^v) always contains the element 
fjL® V. The existence of a martingale transport plan is actually quite an old topic 
that is present (but under different names) at least since the study of Muirhead's 
inequality by Hardy, Littlewood, and Polya [TU]. Several articles in different fields 
(analysis, combinatorics, potential theory and probability) deal with this question 
in different settings, often for marginal distributions in spaces much more general 
than the real line (see e.g. [lillinilSlllSllTlITHllS]). The interest in finding an 
explicit coupling has appeared recently in the peacock problem (see [TT] and the 
references therein): a peacock is a stochastic process {Xt)t^i such that there exists 
at least one martingale {Mt)t£i satisfying \a,w{Xt) = law(M() for every t. The 
problem consists in building as explicitly as possible such a martingale (Mj) from 
(Xt). The martingale transport problem is maybe even closer linked to the theory 
of model-independent pricing in mathematical finance; we refer to |13j for a recent 
survey of this area. Indeed the problem ^ has been first studied in this context 
by Hobson and Neuberger [TS] for the specific cost function c{x,y) = —\y — x\. 
The link between optimal transport and model independent pricing has been made 
explicit in [2^ in a discrete time framework and in [9] in a continuous time setup. 

We already note that several of the basic features of the problem ^ are similar 
to the usual optimal transport problem. This appeals for instance to the weak 
compactness of H(/i,z/) and Hm(m, J^)- If c is lower semi-continuous, this carries 
over to the mapping tt i— >■ E7r[c] for either space of transport plans. 

In particular the infimum is attained. Note also that as in the standard setup the 
problem has a natural dual formulation [5]. However as we already mentioned in 
the previous paragraph, while there is always a transport plan which moves ^ to v, 
the marginal distributions need to satisfy additional assumptions to guarantee that 
a martingale transport plan exists: the set Hm(/^, i^) is non-empty if and only if /i 



is smaller than v in the convex order (see Definition 2.1 ). More details are provided 
in Section 2 along with a construction of a martingale transport plan between two 
given marginals. 
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1.2. Summary on the classical transport problem on M. A cornerstone in 
the modern theory of optimal transportation is Brenier-Riischendorf 's celebrated 
Theorem [H I22j . It treats the optimal transport problem in the particular case 
c{x,y) = \y — a;p, where |.| denotes the Euclidean norm on K". This is simply 
Problem ([T]) when and v are interpreted as measures on M". Under appropri- 
ate regularity conditions on /u, the optimal transport tt e n(/i, u) is unique and 
supported by the graph of a function T : M" M" that is the gradient of some 
convex function. In particular the optimal transport is realized by a mapping. Note 
that in dimension one the gradient of a convex function is simply a monotonically 
increasing function so that the optimal coupling is the usual monotone coupling. 
This fact can be directly proved without too many difficulties (see for instance [16] ) 
but nevertheless it is interesting as one of the rare cases where an optimal transport 
plan can be so easily understood. Moreover even without any assumption on /z, 
the monotone coupling is the unique optimal transport plan. In this paper we will 
see that similar results are valid in the martingale case; e.g., the uniqueness of the 
minimizer or the fact that the optimal coupling is concentrated on a special set 
comparable to the graph of a monotone mapping. 

We present the classical (non martingale) optimal transport problem in M that 
will serve as a guideline to our paper. It is developed for an arbitrary strictly convex 
cost. Any cost of this type activates the same theory, which again is characteristic 
of dimension one. 

Theorem 1.1. Let fi^v be probability measures and c a cost function defined by 
c{x,y) — h(jj — x), where /i : M — > M is a strictly convex function. We assume 
that c satisfies the sufficient integrability condition with respect to /i and v and that 
C(/i, J^) < oo. The following statements are equivalent 

(1) The measure tt is optimal. 

(2) The transport preserves the order, i.e. there is a set T with ■n{V) — 1 such 
that whenever {x,y), {x',y') G F, if x < x' one has also y < y' . 

We have the two following corollaries. 

Corollary 1.2. For given measures and v, if C{^,p) is finite then there exists 
a unique optimal minimizer to the transport problem ([T]) and it is the monotone 
coupling tthf- 

One has in fact tthf = (G^ (8) G',y):^A[o,i] where A is the Lcbesgue measure and 
and are the quantile functions of /i and z/, i.e. the non-decreasing and right- 
continuous functions obtained from the cumulative distribution functions and 
Fi, as a generalized inverse by the formula G(s) = inf{t e M ; s < F{t) }. This 
observation is the reason for the alternative name quantile coupling. For the second 
corollary recall that a measure /i is said to be continuous if fi{{x}) = for every 

x e M. 

Corollary 1.3. Under the same hypothesis, if fj, is continuous then the optimal 
transport plan tthf is concentrated on the graph of an increasing mapping T : M — > 
M. Moreover T^fi = v. 

It is straightforward to see that T — o F^. This formula determines T, /i-a.s. 

Quadratic costs in the martingale setting. While c{x,y) — {y — x)'^ is arguably the 
most important cost function in the theory of optimal transport we stress that it 
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plays a rather different role in the martingale setup. Assume that law{X) = fi and 
law(y) = 1/ are linked by a martingale coupling n and posses second moments. 
Then 

E[XY] ^ E [E[Xr|X]] = E[X^], 
hence we have the Pythagorean relation 

y (y - x)2 dn{x, y) = E[(y - X f] = ^Y^] - 

Thus the cost associated to tt depends only on the marginal distributions, i.e. not 
on the particular choice of tt € 11^(^7 

We record the following consequence: Let c be a cost function and assume that 

c{x, y) = c{x, y)+p- {y-xf + q- {y-x) 

for some real constants p and q. Then in Problem Q the minimizers are the same 
for the costs c and c. In particular, if c{x,y) = h[y — x), we do not expect that 
monotonicity or convexity properties of the function h are relevant for the structure 
of the optimizer. 

1.3. A new coupling: the monotone martingale coupling, main results. 

In this section we will discuss a particular coupling which may be viewed as a 
martingale analogue to the monotone (Hoeffding-Frechet) coupling. Notable simi- 
larities are that it is canonical with respect to the convex order as well as that it is 
optimal for a range of different cost functions. 

Definition 1.4. A martingale transport plamr onRxM. is left-monotone or simply 
monotone if there exists a Borel set F C M x K with 7r(F) = 1 such that whenever 
(x^y^), {x,y^), {x' ,y') G T we cannot have (see Figure^ where this situation is 
represented) 

(3) X < x' and y^ < y' < y^ . 

Respectively tt is said to be right-monotone if there exists F such that if (x,y^), 
{x,y^) and {x',y') are elements ofV then we do not have 

X > X and < y' < . 

We will refer to the set F as the monotonicity set o/tt. 

In this paper we will only state the results for (left-)monotone couplings. The 
corresponding results for right-monotone couplings can be deduced easily. We il- 
lustrate the forbidden situation (|3| in Figure [T] Note that the top line represents 
the measure while v is distributed on the bottom line; this convention will also 
be used in the subsequent pictures. 

The next theorem is proved in Section [Sj 

Theorem 1.5. Let fi^v be probability measures in convex order. Then there exists 
a unique (left-) monotone transport plan in 11^/(^,1'). We denote this coupling by 
TTic and call it left-curtairF] coupling. 



'the name is explained before Theorem 



4.18 
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Figure 1. The forbidden mapping. 



Of course one does not expect that a martingale is concentrated on the graph 
of a deterministic mapping T; this holds only in the trivial case when ^ = v and 
T{x) = X. Rather we have the following result. 

Corollary 1.6. Let /i, G be probability measures in convex order and assume that 
fi is continuous. Then there exist a Borel set S* C M and two measurable functions 
Ti,T2: S ->-R such that 

(1) TTic is concentrated on the graphs of Ti and T2 ■ 

(2) For all x Ti{x) < x < T2{x). 

(3) For all x < e M, T2[x) < T2{x') and Ti{x') (^]Ti{x) ,T2{x)[. 

The following picture (Figure[2| illustrates the coupling ttic in a specific case. The 
measures and v are Gaussian distributions having the same mean; the variance of 
V being greater than the variance of There exist two points at which the density 
of /I (w.r.t. Lebesgue measure) equals the density of v. Denote the smaller of these 
points by xq. Then we have Ti{x) = T2{x) = x for x < xq. For x > xq, the map 
Ti is strictly decreasing and T2 is strictly increasing. 

The subsequent result states that the transport plan ttjc is optimal for a variety 



of different cost functions, (see Theorem 6.1 below.) 



Theorem 1.7 (ttic is optimal). Let fi,!/ be probability measures in convex order. 
Assume that c{x,y) = h{y — x) for some differentiable function h whose deriva- 
tive is strictly convex and that c satisfies the sufficient integrability condition. If 
Cm(m, '^) < 00 then ttic is the unique optimizer. 

Natural examples of cost functions to which the result applies are given by 
c{x, y) ^ {y — x)^ and c{x, y) = exp(j/ — x). 

We discuss a further characteristic property of the transport plan ttic. For a real 
number t and tt S n(/i, i') consider the measure 

>^t — proj^7r]„oo,i]xR 

where proj^ : (a, 6) G i-> 6 G M. Loosely speaking, the mass /x|(_oo,t] is moved 
to vj' by the transport plan tt. It is intuitively clear (and not hard to verify) that 
a transport plan tt G n(/i, h') is uniquely determined by the family (vJ')ti£M_. 

Using this notation, the classic monotone transport plan tthf is characterized 
by the fact that for each t, the measure i/^^^ is as left as possible. More precisely, 
for every t the measure Vf^^ is minimal with respect to the first order stochastic 
dominance in the family 

K :7rGn(Ai,:/)}. 
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Figure 2. Scheme of the left-curtain ttic couphng between two 
Gaussian measures. 



We have the fohowing, analogous characterization for the monotone martingale 



coupling TTic. This is in fact the way we will define ttic in Theorem 4.18 



Theorem 1.8 (ttic is canonical for the convex order). For every real number t the 
measure i^^'" is minimal with respect to the convex order ( i. e. second order stochastic 
dominance) in the family 

The next theorem summarizes the properties of ttic. 

Theorem 1.9. Let fj,,i> be probability measures in convex order. Let /i : M — > M 
be a differentiable Junction such that h' is strictly convex and assume that the cost 
function c : {x,y) i— ^ h{y — x) satisfies the sufficient integrability condition. 

We assume moreover CMilJ-ji^) < +oo. Let n be a martingale coupling of 
IIm{p,v). The following statements are equivalent: 

• The coupling n is monotone, 

• The coupling n is optimal, 

• The coupling tt is the left-curtain coupling tt\c: for every (tt', t) G UmIp-, v)x 
M., the measure is smaller than in the convex order. 

Note that Theorem|1.9|is a consequence of the other results stated above. 



1.4. A variational lemma for the martingale transport problem. An im- 
portant basic tool in optimal transport is the notion of c-cyclical monotonicity (see 
p7l Chapter 4]) which links the optimality of transport plans to properties of the 
support of the transport plan. A parallel statement holds true in the present setup 
and plays a fundamental role in our considerations. Heuristically we expect that 
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if TT G IlM{fJ-,v) is optimal, then it will prescribe optimal movements for single 
particles. To give a precise formulation we use the following notion. 

Definition 1.10. Let a be a measure on R x R with finite first moment in the 
second variable. We say that a', a measure on the same space, is a competitor of 
a if a' has the same marginals as a and for (proj^ a)-a.e. x € M 

yda^iy) = J yda'^{y), 

where {ax)xes. o,nd {ot'x)x£V. are disintegrations of the measures with respect to 
proj^t a. 

Lemma 1.11 (Variational Lemma). Assume that iJ,,u are probability measures in 
convex order and that c : — >■ M is a Borel measurable cost function satisfying 
the sufficient integrability condition. Assume that tt G Il]\j{fj,,i>) is an optimal 
martingale transport plan which leads to finite costs. Then there exists a Borel set 
F with 7r(r) ~ 1 such that the following holds: 

If a is a measure onRxR with \ spt(a)| < oo and spt(a) C T for every competitor 
a' of a then we have J cda < J cda'. 

This variational lemma is one of the key ingredients in our investigation of the 
monotone martingale transport plan ttic introduced above. Moreover it turns out 
to be very useful if one seeks to derive results on the optimizers for various specific 



cost functions. Assuming for simplicity that fi is continuous, Lemma 1.11 allows us 
to derive the following results. 

(1) If c(a;, y) ~ {y ~ x)"^, then card(spt tt^) < 3, ^{x)-a.s. 

(2) Assume that c{x, y) — h{y — x) for some continuously differentiable function 
h and that the derivative h' intersects every affine function at most in 
k €N points. Then card(spt tTj;) < fc, /i(a;)-a.s. for the optimizing tt. (See 
Theorem |7.1[ and also Theorem |7.2| for a similar result which appeals to 
the classical transport problem.) 

(3) If c{x,y) = —\y — x\, then there is a unique optimizer tt e nAf(/i, ;/). 
Moreover card(spt tt^;) < 2, ^{x)-a..s. (This was first shown in [HoNell], 
see Theorem 1 7. 3 [ ) 

(4) If c(a;, y) = \y — x\, then there is a unique optimizer n e TlniitJ', i^)- Moreover 
card(spt vTj,) < 3 and card(spt7rj. \ {x}) < 2, /i(a;)-a.s. (see Theorem |7.4[) 



1.5. Organisation of the paper. We will start with a warm up section (Section 
[2]) in which we derive some basic properties and explain a procedure that allows to 
find a martingale coupling for two given measures in convex order. Then, in Section 
[Sj we establish the variational lemma. Lemma 1.11| which will play a crucial role 



throughout the paper. In Section |4] we introduce and study the shadow projection, 
which permits us to introduce the left-curtain transport plan ttic. We define it in 
Theorem |4.18| through its canonical property on the convex order, we explain the 
name "left-curtain" and prove that it is monotone in Theorem |4. 21 [ The particular 
properties of the transport plan ttic are established in Sections [5] and [6j In section 
[7] we present results related to other costs and other couplings. Finally, in the 
Appendix, we give an alternative derivation of the variational lemma, Lemma |l.ll| 
This second proof is longer but has the advantage to be constructive and self- 
contained. 
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2. Construction of a martingale transport plan for measures. 

In this section we extend the martingale optimal transport problem to general 
finite measures with finite first moment and we define the convex order on this space. 
We prove that there exists a martingale transport plan between two measures in 
convex order and give a very short description of the duality theory linked to our 
optimization problem. 

2.1. Basic notions. Denote by M. the set of finite measures on M having finite 
first moment. We consider it with the usual topology, i.e. we say that a sequence 
{vn)n converges weakly in M to an element v £ M. \i 

(1) (yn)n converges weakly in the usual sense, i.e. using continuous bounded 
functions as test functions; 

(2) the sequence J |x| df„ converges to / |x| dz/. 

Note that this is the same as adding all functions that grow at most linearly in ±oo 
to the set Cb of continuous and bounded test functions. 

The reason we are interested in the space is that we will need to consider 
also transport plans between measures £ Ai which have (the same) mass fc, 
where k is possibly different from 1. In direct generalization of the earlier definition 
the set of transport plans n(/x, v) then consists of all Borel measures tt on M x M 
satisfying proj'^ n = fi, proj^ tt = v. As a consequence of Prohorov's Theorem the 
set n(^, v) is compact; see e.g. [571 Lemma 4.4] for details. If c is a continuous (or 
lower semi-continuous) cost function satisfying the sufficient integrability condition 
with respect to fi and v then cost functional 

TT £ n(/i, i^) I— > J cdn £] — oo, +oo] 

is lower semi-continuous w.r.t. the weak topology ( 27, Lemma 4.3]). It follows that 
the infimum in the classic transport problem is attained. 

We proceed analogously in the martingale setup. If /i and v are not necessarily 
probabilities, we define IlMifJ-,i^) to consist of all transport plans tt such that the 
disintegration {TTx)xeM w.r.t. /i satisfies 

for /z-almost every x. It is not difficult to see that this property can be tested using 
continuous functions: tt £ n(/i, v) is a martingale if and only if 

(4) J p{x){y ~ x)d7r{x,y) = 

for all continuous bounded functions p : M — > M (see [21 Lemma 2.3]). Hence the 
set HjnifJ','^) is compact in the weak topology (see [H Proposition 2.4]). Precisely 
as in the usual setup it follows that the value of the minimization problem ([2| is 
attained provided that the set UM{f^, v) is non-empty. 

Of course it is a fundamental question under which conditions martingale trans- 
port plans exist. In the usual optimal transport setup the problem is simple enough: 
the properly renormalized product measure — 7|jyM ® ^ witnesses that I{{^^v) is 
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non-empty. As mentioned in the introduction, the proper notion which guarantees 
existence of a martingale transport plan is the convex order. As it plays a crucial 
role throughout the paper we will discuss it in some detail. 

2.2. The convex order of measures. Let us start with the definition. 

Definition 2.1. Two measures /i and v are said to he in convex orde^if 

(1) they have finite mass and finite first moments, i.e. lie in A4, 

(2) for convex functions Lp defined on M, J (pd/i < J ipdv. 

In that case we will write i^- 

Note that if fj, <c then one can apply (2) to all the affine functions. Using 
the particular choices '^(x) = 1 and ^pix) = — 1 one obtains that and v have the 
same total mass and considering the functions (y5(x) = x and ^p(x) = ~x one finds 
that [x and v have the same bary center]^ 

It is useful to know that it is sufficient to test hypothesis (2) against suitable 
subclasses of the convex functions. For instance measures /i, v having the same 
finite mass and the same first moments are in convex order if and only if 

J(a; — fc)_|_ d/i(a;) < J(a; — fc)+ dv{x) 

for all real fc. This follows from simple approximation arguments (see |12j and also 



Paragraph 4.1 1 using monotone convergence. In particular it is sufficient to check 
(2) for positive convex functions with finite limit slope in — oo and +oo. 
We give some examples of measures in convex order. 

Example 2.2. If b is an atom of mass a > at the point x, then 5 <c ^ simply 
means that v has mass a and bary center x. 

Example 2.3. // for i ^ I, . . . ,n then -Li Mi die I]"=i ^i- 

Example 2.4. // two measures /x and fi' have the same barycenter and the same 
mass, /i is concentrated on [a, 6] and fi' is concentrated on R\]a,b[ then fj, <c f^' ■ 
Indeed it can be proved for convex functions if defined on M that 

J ipdfi < J ipdn < J i/j d/i' < J (fi d/i' 

where tp is the convex function linear on [a, b] and equal to if outside [a, b] . 

Example 2.5. // two measures /i and fi' have the same barycenter and the same 
mass, fi — {fi A /i') is concentrated on [a,b] and /i' — (/i A /i') is concentrated on 



M\]a,b[ then we have fj, :<c /i'- To see this, one can apply Example 2.4 to the two 
reduced measures. Adding /i A /i' conserves the order. 

The following result formally states the connection between the convex order 
and the existence of martingale transport plans. 

Theorem 2.6. Let fi,i> G Ai. The condition /i :<c v is necessary and sufficient 
for the existence of a martingale transport plan in UmIp, v). 



■^The convex order is also called Choquet order resp. second order stochastic dominance. 
^The barycenter of a measure is the first moment of the normalized measure ^j^jjyM- 
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It is a simple consequence of Jensen's inequality that the condition /i is 
necessary to have 11^(^1 ^) 7^ if tt is a martingale transport plan and Lp is convex 
then 

<y5(y) dj^(y) = / 'y5(y)d7r(a;,?/) = 

<p(2;)d7ra;(y)d^(a;) > j ip{x)d^i{x). 

The fact that the condition is also sufficient is well known and goes back at least 
to a paper by Strassen [211 . Nevertheless we think that it is worthwhile to describe 
a procedure which allows to obtain a martingale transport plan. This is what we 
do in the next subsection. 

2.3. Construction of a martingale transport. We fix finite measures /i, v hav- 
ing finite first moments and satisfying fx <c ^] our aim is show that HM{^^^v) is 
non-empty. The desired result will first be given in the case that ^ is concentrated 
on finitely many points. 

Proposition 2.7. Assume that fi = Y^^=i ^i' where each Si is an atomic measure. 
If v satisfies fi :<c i^, then IImC/^, J^) is non-empty. 



First note that with Example |2.2| this proposition is clear if n = 1. We will 
establish Proposition |2.7| in an inductive way. To perform the inductive step we 
need to understand how to couple a single atom S — a6x (for instance Si) with a 
properly chosen portion v' of v so that the other atoms (X]r=2 ^i) smaller than 
I' ~ ly' in convex order. Recalling Example 2.2 we should pick v' so that it has 



mass a and barycenter x. Clearly, it also needs to satisfy v' < where < refers to 
the usual pointwise order of measures. 

As (5 is a part of /.t and jj, :<c J^, we can introduce the measure jl = fi — S 
which has mass t ~ — a. Obviously we then have S + fl <c v. We are 

looking for the measure v' among the measures \vs : s G [0,^]} obtained as the 
restriction of v between two quantiles s and s' — s + a. More precisely we consider 
Vs = G^\s.s+a\ where G : [0, t + a] — )• R is the generalized inverse of the cumulative 
distribution function of i^, and \ys.s'\ is the Lebesgue measure restricted to [s,s']. 
For completeness note that v — G^X^Q t+a]- 

The barycenter B[s,v) of Vs depends continuously on the parameter s G [0,i] 
and we claim that 

(5) B{0,v)<x, B{t,v)>x. 

Indeed this is a consequence of the convex order relation (5 -\- ji) dic v applied to 
the functions u i— > (u — G(q;))_ and u i— > (u — G(i))+. By the intermediate value 
theorem, this implies that there exists some s G [0, t] such that i^s has barycenter 
X. Moreover if _B(s,z/) = B{s' ^v), the measures Us and Vs' are the same so that 
there exists a unique measure with barycenter x. We denote it by v' . 
This discussion leads us to the following lemma. 

Lemma 2.8. Let fi be of the form = /i + S, where S is an atom and assume that 
1^ i^- Then there exists a unique splitting of the measure v into two positive 
measures v' and v = v — v' in such a way that 

(1) 5 y', 
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(2) = where I — conv(spt(z^')) is the interior of the smallest interval 
containing the support of v' . 

Moreover the measures jl and i> satisfy fi :<c v. 

Proof. Having already constructed v' (and /, i.e. ]G{s), G{s + q;)[ if / is bounded) 



in the paragraph above Lemma 2.8 it remains to show item (2): fi is smaller than 



v in the convex order. Let ip he a. non-negative convex function which satisfies 

limsup |(/3(a;)/x| < +oo. 

|a:|— >C30 

We will prove that J ipdjl < J (p di). To this end we introduce a new function ip 
which equals on R \ / and is linear on /. The function ip can be chosen to be 
convex and satisfy > tp. (Note that this is possible also in the case where / is 
unbounded.) The functions p and ?/' coincide on the border of /. We have 

pdfi < j ipdp. < J ■0 d/i — y t/jdS. 

But as -0 is linear on /, one has J tp dS — J ip dv' and because :<c v one has 
J ipdii < J ip diy. It follows that 

pdfl< J tp dv — J ip di>' = J'ipdi' = J pdD. 

The last equality is due to the fact that i> is concentrated on M \ /. We have thus 
established our claim that fi ^- D 



Proof of Proposition \2. 7\ In the first step we apply Lemma |2.8| to the measures 
d = 6i and jl = X]"=2 obtain a splitting v = C>i + i) that satisfies di :<c vi and 
pL ^- Trivially nM(<5i,^'i) consists of single element tti. 

In the next step we repeat the procedure with fi and v in the place of /i, v 
and continue until the n-th step where (5„ can be martingale-transported to the 
remaining part of u because the convex order relation 5„ ~ "^^^=1 ^i) 



classified in Example 2.2 Hence we have obtained recursively a sequence (^'i)"=i 
such that 5i and ui -\- ■ ■ ■ + Vn — v. We have constructed n martingale 

transport plans 7ri,...,7r„ where tt^ is the unique element of Hjv/(<^i, i^i)- Thus 
TTi + • • • + 7r„ is an element of nM(/^, i^)- D 



To extend Proposition 2.7 to the case of general /i e we need the following 



simple and straight-forward fact that will also be useful in Section |4] 

Lemma 2.9 (Approximation of a measure in the convex order). Let 7 G A^. There 
exists a sequence (7^"'')n of finitely supported measures such that 7("+i) 7(") 
and (7^"^)n converges weakly to ^ in M. 

Proof. For any partition of M in finitely many intervals we can associate some 
7^- smaller than 7 in the convex order. We simply replace 7 = S/gjTk by 
^j — 'Ylij where Sj is an atom with the same mass and same barycenter as 7/. 
Note that if J"' is finer than J7 (the intervals of J7 are broken in sub-intervals) then 
Ij dic Ij'- For /c, iV e N we consider the partition 



Jk. 



N 




i i + 1 
2^' 2'^ 



U]iV, -fcx)[U] -00,-iV] 
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and set -fk,N = iJk.N- We have ^k,N Ik+i.N and jk.N lk,N+i- Write 
^(") for 7„ „. Let / a be continuous function that grows less than hnearly in 
±00. There exist a,b > such that |/(a;)| < a\x\ + b. Let e > and N be 
such that /|2,|>7v ^-l^l ~^ bd'j{x) < e/3. The function / is uniformly continuous on 
[—N, N]. Thus there exists cj such that ii x,y E [—N, N] and |a; — < w we have 
\f{x) — f{y)\ < e/3. Let k be such that 1/2'^ < w. For n > max{fc, N} we have 



|7(/)-7^"n/)l< 



N 



/d7- 



N 



N 



/d7 



(") 







+ 


/ /d7 


+ 


/ /d7(") 


e 

< - 








J\x\>N 




J\x\>N 


- 3 


The two first estimations are 


a consequence of our preparations 


that 


















< 


/ a\x\ 


+ 6d7(") < / 


a|a;| 




J\x\>N 




J\x\>N 




J\x\>N 



6d7 



where the convexity of x 1— ?> a\x\ + b and 7|*'^>^ 7|k|>a' is used. 



□ 



We are now finally in the position to conclude the proof of Theorem 2.6 



Proof of sufficiency in Theorem \2.6[ Pick a sequence of finitely supported mea- 
sures (/i„)„>i satisfying /i„ :<c v such that /i„ converges to \i weakly. (By Lemma 
2.9[ the sequence could be chosen to be increasing in the convex order, but we do 
not need this here.) We have already solved the problem of transporting a discrete 
distribution. Pick martingale measures (7r„)„>i which transport /i„ to v for each 
n. To be able to pass to a limit, we note that the set 



is compact. Hence the sequence (7r„)„>i has an accumulation point tt in and of 
course tt is as desired: its marginals are /i and v and it is a martingale transport 
plan. □ 



We have thus seen a self-contained proof to Theorem |2.6| Of course the reader 
may object that the martingale established in the course of the proof was in no 
sense canonical and that the derivation was not constructive since we have invoked 
a compactness argument to prove the existence in the case of a general measure 
/i. In Section 4 we will be concerned with a modification of the above ideas which 
does not suffer from these shortfalls. 

2.4. A dual problem. We mention that the martingale transport problem ([2| 
admits a dual formulation. In analogy to the dual part of the optimal transport 
problem one may consider 

(y5 d/i + / -0 di/ 



Maximize 

where one maximizes over all functions </? G L^{^), € L^{v) such that there exists 
A e Cb(M) satisfying 

(6) c{x,y)> tp{x) + 'il){y) + l^{x){y - x) 
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for all x,y CzR. Denote the corresponding supremal value by D. The inequality 
D < Cm(m> ^) then follows by integrating (|6| against tt e IIMi^J■, J^)- In the case of 
lower semi-continuous costs c the duality relation D = CmCm, t^) is established in ^ 
Theorem 1.1]. We also note that the dual part of the problem appears naturally in 
mathematical finance where it has a canonical interpretation in terms of replication. 
We refer to [2j for more details on this topic. 

3. A SHORT PROOF OF THE VARIATIONAL LEMMA 



The aim of this section is to establish the variational lemma, Lemma 1.11 That 
is, for a given optimal martingale transport plan tt we want to construct a Borel 
set r, 7r(r) — 1 such that the following holds: if a is a measure on K x M with 
I spt(a)| < oo and spt(a) C T then we have J cda < J cda' for every competitor 
a' of a. 

As mentioned above this result can be viewed as a substitute for the charac- 
terization of optimality through the notion of c-cyclical monotonicity in the clas- 
sical setup. Under mild regularity assumptions it is simple enough to show that a 
transport plan tt which is optimal for the (usual) transport problem is c-cyclically 
monotone, we refer to [271 Theorem 5.10]. However this approach does not trans- 
late effortlessly to the martingale case. Roughly speaking the main problem in the 
present setup is that the martingale condition makes manipulation of transport 
plans a relatively delicate issue. 

Instead we give here a proof of Lemma |1.11| that is based on certain measure 
theoretic tools: it requires a general duality theorem of Kellerer [121 Lemma 1.8(a), 
Corollary 2.18] which in turn requires Choquet's capacability theorem [B]|^ See the 
Appendix for an alternative and constructive proof of the variational lemma. 

The crucial ingredient is the following result: 

Theorem 3.1. Let {Z,(^) be a Polish probability space and M C Z". Then either 
of the following holds true: 

(1) There exist subset {Mi)i of such that C(proj* Mi) — for i — 1, . . . , n 
and 

n 

MC\J M,. 

i=l 

(2) There exists a measure 7 on Z" such that j{M) > and proj^ 7 < C for 
,n. 



We refer to [U Proposition 2.1] for a detailed proof of Theorem 3.1 from Kcllerer's 
result. 

Proof of Lemma \l.ll\ Fix a number n G N. We want to construct a set r„ for 
which the optimality property holds for all a satisfying |spta| < n. This set r„ 
will satisfy 7r(r„) ~ 1. Clearly F — HngNrn is then as required to estabhsh the 
lemma. 

For a fixed n G N, define a Borel set M by 

(1) a is a measure on M x M, 
M := ^ {xt, yi)i^i ■■ 3a s.t. (2) spt a C {{x^, i/i) : i = 1, . . . , n}, and 

(3) 3 competitor a' satisfying J cda' < J cda. 



'^This approach is inspired by \T\ where c-cychcal monotonicity is linked to optimahty with the 
help of Kellerer's result. 
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We then apply Theorem 3.1 to the space (Z, () = (M , tt) and the set AI. If we are 
in case (1), let N be u£7proj'(^^0 so that 7r(iV) = and M C {N x U 
. . . U X iV) = {Z\N)". We can then simply define T,, '-^ Z \ N ^ R'^ \ N to 

obtain a set which does not support any non-optimal a with | sptaj < n. Moreover 
7r(r„) = 1 as we want, hence the proof is complete. 

It remains to show that case (2) cannot occur. Striving for a contradiction 
we assume that there is a measure 7 such that j{M) > and proj^ 7 < tt for 
i = 1, . . . ,n. Restricting 7 to M, we may of course assume that 7(E x M \ AI) — 0. 
Rescaling 7 if necessary we may also assume that proj^ 7 < ^tJ"- 

Consider the measure ut — X]r=i P-'''-'j# T smaller than tt and has 

positive mass. In particular = proj^ w < /i. We will find a competitor uj' (recall 



Definition |1.10|) such that ui' leads to a smaller cost than w, i.e. 

c{x, y) duj' < / c(x, y) dui. 



If such a measure w' exists then the measure tt—uj+uj' is a martingale transport plan 
which leads to smaller costs than tt, contradicting the optimality of tt. It remains 
to explain how oj' is obtained. For each p = ((xi, yi), . . . , (xn, yn)) G (M x M)" let 
Up be the measure which is uniformly distributed on the set {(xi, j/i), . . . , (a:„, yn)}- 
Then 

UJ ^ I Qfp d7(p). 

Jp6(RxR)" 

For each p G (M x M)" let a' be an optimizer of the problem 



Minimize / c{x,y) d(3{x,y), /3 competitor of a^. 

J{x,y)eRxR 

We emphasize that a'p exists and can be taken to depend measurably on p. This 
follows for instance by calculating a'p using the simplex algorithm. 

As 7 is concentrated on M, for 7-almost all points p the measure satisfies 

/ c{x,y)da'p{x,y) < / c{x , y) dap{x , y) . 

J {x,y)eRxR J(x,y)eRxR 

(Note that a'p is in general not concentrated on the same set as ap.) Then lj' 
defined by 

oj' ^ a'p d-f{p) 

Jpe(RxR)" 

satisfies the above conditions as required. For instance we have 



cduj' = / c{x,y)da'p{x,y)d'y{p) 

R Jpe(RxR)" J(x,y)GRxR 

< / / c{x,y)dap{x,y)d'y{p) = / cdw. 

Jpe(RxR)" J(x,y)GRxR JRxR 

The other properties are checked analogously. □ 



We note that the just given proof of Lemma 1.11 also works in more general 
setups: the result remains valid if martingale transport plans between higher di- 
mensional spaces are considered. In a different direction, on may state versions 
which appeal to more than two prescribed marginals, i.e. martingales (^i)"=i with 
more than just two steps Xi — X and X2 — Y. 
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Lemma |1.11| will often be applied in conjmiction with the technical assertion 
below. Given F C we will use the notation for e M : {x, y) G F}. 

Lemma 3.2. Let k be a positive integer and F C M^. Assume also that there are 
uncountably many a G M satisfying \Ta\ > k. 

There exist a and 61 < . . . < 6fc G Fq such that for every e > one may find 
a' > a and b'^ < . . . < b'j. ^ Va' with 

max(|a - a'\,\bi - b\\, . . . , \bk - 6^1) < £■ 

Moreover one may also find a" < a and b'( < . . . < b'^ G F^" with 

max(|a-a"|,|6i-6'/|,...,|6fc-6'fe'|) <£. 

Proof. Write A for the set of all a such that |Fa| > k and pick for each a G A 
distinct elements G F^. Set Fa = {(a, 6J , . . . , 6^) : a € A}. We call 

(a, 6°, ... , G Fa a right- accumulation point if for every e > there exists a' G 
]a,a + e[ such that |6° — 6f | < e for every i. We call it right-isolated otherwise. If 
p belongs to the set of right-isolated points C Fa then there exists some £p > 
such that 

[M + ((0, Ep) X (-£p, £p)'=)] n Fa - 0, 

where + refers to the Minkowski sum of sets. 

Assume for contradiction that the set Ir is uncountable. Then there exists some 
C > such that K = {p G Ir : £p > C} is uncountable. Given pi,p2 G K, we have 
P2 ^ Pi + ((0, C) X (— CiC)'^) ■ Since pi and p2 have different first coordinates, this 
implies 

[{p,} + ((0, C/2) X (-C/2, C/2)'^-)] n [{P2} + ((0, C/2) X (-C/2, C/2)'^)] = 0. 

This is a contradiction since there can not be uncountably many disjoint open sets 
in M'^+i. 

If follows that all but countably many elements of A are right-accumulation 
points. Arguing the same way with left replacing right we obtain the desired con- 
clusion. □ 

4. Existence of a monotone martingale transport plan: the 
left-curtain transport plan 

A short way to find a monotone martingale transport plan would be to take a 
minimizer of Probl em ([2| ) for c{x, y) = h{y — x) where h is appropriate. Then one 



may apply Lemma to prove that this minimizer is monotone. This kind of 
argument will be encountered in Sections |6] and [7] below. Here however we find 
it useful to give a construction which yields more insight in the structure of the 
martingale transport plan. In particular it will also allow us to prove the uniqueness 
of a monotone martingale transport plan in Section [s] and it does not require any 
assumptions on ji and v. 

For our argument, we reconsider the construction used in Proposition |2.7| and 
decide to transport the atoms 5i oi ^, = 5i to 1/ in a particular order, starting 
with the left-most atom and continuing to the right. It turns out that one can 
characterize the martingale coupling that we obtain in terms of an extended convex 



order and shadow introduced in this part (see Definition 4.3 and Lemma 4.6 1. 
These notions enable us to adapt the construction directly to the continuous case, 
thus making the approximation procedure used in Paragraph |2 . 3| obsolete. 
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4.1. Potential functions. An important tool in this section will be the so called 
potential functions. For each /i G we define the potential function : M — > M 

by 



Up.ix) = / \y-x\dfi{y) 

J —oo 

for a; € M. Set k = and ra — ^ j xdfj,. 

Proposition 4.1. // ^ is in M and k = /^(IR), m = ^ J xd^, then has the 
following properties: 
(i) Ufj_ is convex, 

(ii) lini3._5._00 u^{x) — k\x — m\ — Q and linij._>._|.oo u^{x) — k\x — m\ = 0. 
Conversely, if f is a function satisfying these properties for some numbers m € R 
and k G [0, +cxd[, then there exists a unique measure ^ £ Ai such that f — u^. The 
measure /i is one half the second derivative f" in the sense of distributions. 

Proof. See for instance the proof of Proposition 2.1 in [12]. □ 

Let us list some relevant properties of potential functions. 

Proposition 4.2. Let ji and v be in M.. 

• If fi and V have the same mass, pi ^ is equivalent to Uf^ < u^. 

• We have n < v if and only if has smaller curvature than Ui, . More 
precisely n < v if and only if u^, — is convex. 

• A sequence of measures (/i„)„ in M with mass k and mean m converges 
weakly in Ai to some fi if and only if (u^^)„ converges pointwise. In that 
case = hm„^+oo u^^^ . 

Proof. For the first property see [TTl Exercise 1.7], for the third [T^l Proposition 



2.3]. The second property is a consequence Proposition 4.1 Namely 2/i and 2v are 



the second derivatives of and Ui,. □ 

We will need the following generalization of the convex order. 

Definition 4.3 (Extended convex order on A^). Let /i and v be measures in A4. 
We write fL <e o-nd say that v is greater than fi in the extended convex order if 
for any non-negative convex function : M — > M we have 

(pdn < J (fi dv. 

Trivially, if fi ^ then we have also fi v. Conversely if measures /i, v have 
the same mass and mean, then \x <e v implies that \x die This is easy to see 
since every convex function is positive up to adding an affine function. The partial 
order -<c on M.vs, extended by the order in the sense that -<e gives rise to new 
relations. The measures fj,, v no longer need to have the same mass and bary center. 
For instance if < i^, we have /x :<e v while the two measures will not have the 
same barycenters in general. It fact we have the following simple characterization 
of the extended order in terms oi 

Proposition 4.4. Assume that fi :<e v ■ Then there exists a measure 9 < v such 
that /i dc 0- 

Of course the converse statement is true as well: if there exists 6 such that 
/i :<c and <v then we have also 11 :<e v. 
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Proof. Let fi and satisfy /i :<e v- We can assume that v is a. probability measure 
and denote by k and m the mass resp. the mean of /z. We define a measure 9 < v 
as follows. Consider the generalized inverse of the cumulative function of v. 
Recall that A is the Lebesgue measure on M. For a parameter C e [0, 1 — fc], we 
denote by the restriction of A to [0, 1] \ [C, C + (1 ^ fc)]- This measure has mass 
k as well as does 9 = (G,y)^A^. We fix now C in order to make 9 have mean m. 
This can be done because the mean of is a continuous function of C, and we can 
use the intermediate value theorem. Actually one can consider the non-negative 
and convex functions x i-^ [x — G^{1 — and x i— ^ {G^{k) — x)+ and evaluate 
them with fi and v. It follows from the convex order relation that m is indeed an 
intermediate value between the means of 9 for C = and ( = 1 — k. 

We are now given two measures fi and 9 of the same mass and the same mean. 
Consider a convex function ip. We want to prove that its integral with respect to fi 
is smaller than the one with respect to 9. For that we can assume without loss of 
generality (p{G,{C)) = <p(G,(C + (1 - fc))) = 0. Thus 



ipdfj,{x) < J ip+{x)dii{x) 

< / (p+{x) diy{x) ^ I ip+{x)d9{x)^ I ip{x)d9{x). 



This concludes the proof. □ 

4.2. Maximal and minimal elements. For fj, <e let be the set of measures 
f] such that /i V Sind rj < v. Note that the measures in have the same mass 
and the same barycenter as /i. In the next lemmas we consider the partially ordered 
set {F'^, dic) and show that it has both a maximal and a minimal element. 

Lemma 4.5. For fi <e v, the set F^ has an element which is maximal w.r.t. the 
convex order, i.e. there exists T"[jjL) such that 

(i) T-ifi) < V, 

(it) fidcT-'ifi), 

(Hi) If T] is another measure satisfying (i) and (ii) then we have r/ :<c T'^{fi). 

Proof. Consider the measure 9 defined as in Proposition |4.4| and let rj be another 
measure in F^ . We know that 9 is concentrated outside an open interval / and that 
it coincide with on M \ / so that 9\^\j > ?7|h\/. Thus rj — {rj A9) is concentrated 

that 



2.5 



on / whereas 9 — (rj A9) is concentrated on M \ /. It follows from Example 

The existence of a minimal element is more involved and will play an important 
role subsequently. 

Lemma 4.6 (Shadow embedding). Let fi,!/ Cz M and assume fi :<e v- Then there 
exists a measure S'^{p), called the shadow of fi in v such that 

(t) S-{fi) < V, 
(n) M^c S"{fi), 

(Hi) If ri is another measure satisfying (i) and (ii) then we have S'^{fi) :<c V- 
As a consequence of (Hi), the measure is uniquely determined. Moreover it 
satisfies the following property: 

(Hi') If r] is a measure such that rj < v and fi :<e rj then we have S'^ {ji) :<e '7- 
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Proof of Lemma \4-.6\ First observe that (iii') follows from Lemma 4.4 applied to n 
and rj. 

We write k (resp. m) for the mass (resp. the mean) of ^. 

The principal strategy of our proof is to rewrite the problem in terms of convex 
functions. Set f — and g — u^. 

The task is to find a convex function h = u^i such that 

(1) f ^ h and lim|2,|_^oo h{x) — k\x — m\ — 0, 

(2) h — g is concave, i.e. h" < g" in a weak sense, 

(3) For h2 E Up it holds h <h2- 

We note that by Lemma [44| there exist functions satisfying Conditions (1) and (2). 
Hence the sets F — {rj \ ^ <c V and 

Up — {h is convex and satisfies (i) and (ii)} — {h — Uj^ \ rj E F} 

are not empty. To show that there exists a function which also satisfies the third 
property we define 

(7) h= mi h 

which is a priori not necessarily convex and set 

h = convex closure (/i). 

We will prove that h \s in up ■ The Conditions (1) and (3) are clear. Let us prove 
(2), which is more difficult. 

For h € Up, the function h — g is concave, ft follows that h ~ g — (inf h) — g ^ 
inf (ft, — g) is also concave. As we do not know yet whether h is convex or not we 
pursue the same strategy for h as what we did for h. For that we will replace up 
in Q by a (possibly larger) selj^of convex functions U ^ up so that h = inih^u h. 
This is possible if U is the set of all functions h : x i-^ bhi(x-a)^+ah2(x+b) ^jggj^g^j 
for parameters (a, 6, ft-i, /12) satisfying a,b > 0, (a, 6) ^ (0,0) and /ii,ft.2 S up. It 
remains only to prove that h — g is concave for every h € U. 

Let s < t he real numbers and I an affine function such that h = g + I in the 
points s and t. We want to prove h > g + I on [s, t\. Let li and I2 both be affine 
and such that hi = g + li in s — a and t — a for i — 1, in s + b and t + b for i = 2. 
Then I = Moreover for x G [s, t] we have hi{x + b)> g{x + b) + li{x + b) 

and h2{x + b) > g{x + b) + l2{x + b). It follows that for x € [s, t] 

. , ag{x + b) + bg{x — a) ali{x + b) + bl2{x — a) 
a + b a + b 

>g{x)+l{x). 

Here the last inequality holds since g is convex. Finally we have proved that for 
h £ U, h — g is concave. Hence h — g is concave. □ 

Note that in Lemma [2. 8| we have implicitly encountered the shadow in the case 
where the starting distribution consists of an atom. 

Example 4.7 (Shadow of an atom). Let S be an atom of mass a at a point x. 
Assume that S ^p v. Then (b) is the restriction of v between two quantiles, i.e. 
it is v' = {Gii)^\\^s,s'\ where s' — s — a and the barycenter of v' is x. Let us consider 

posteriori both sets are the same 
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another measure 77 G with fJ- V o.'i^d rj < v. Applying the observation from 



Example 2.5 to v' and rj we obtain v' :<c rj. 



4.3. Associativity of shadows. In this section we will establish the following 
associativity property of the shadow. 

Theorem 4.8 (Shadow of a sum). Let 71,72 and v be elements of A4 and assume 
that /i = 71 + 72 d:E V. Then we have 72 v — S"^(7i) and 

S%li + 72) = ^'^(71) + 5''-^"'(^^H72). 

In Figure [3] we can see the shadow in 1/ of = 71 + 72 for two different ways of 
labeling the 7i's. In both cases vi := S^{'^i) is simply 71. On the left part of the 
figure 5''^"'^^ (72) is quite intuitive while on the right part it is deduced from the 
associativity of the shadow projection. Actually it has to be 5"'(/i) — vi. 



72 




71 





Figure 3. Shadow of ^ = ^1 + ^2 in v. 



Our proof of Theorem |4.8| will rely on approximations of /i by atomic measures 
and need several auxiliary results. In our argument we will require a certain conti- 



nuity property of the mapping v i— > S'^{S) stated in Lemma 4.10 We will derive it 
now with the help of the Kantorovich metric. 

Proposition 4.9 (Kantorovich metric). The function W, defined on Ai by 



(8) 



sup/ {j fdv - J fdO) 



otherwise. 



where the supremum is taken over all l-Lipschitz functions / : M — )• M is a metric 
with values in [0, +00] . For k > 0, the associated topology on the subspaces of 
measure of mass k coincides with the weak topology introduced in Paragraph \2.1\ 



To simplify the discussion, we consider the case where v, v are probability mea- 
sures. In this case, Wiy, ii) is the Kantorovich metric (also called 1-Wasserstein 
distance, or transport distance). It can also be written as — Fc||i or HG^ — G^lji 
where Fy is the cumulative distribution function of v and its generalized inverse. 
The norm ||.||i refers to the L^-norm for the Lebesgue measure on R resp. [0,1]. 
Recall that v = (Gy)^X. 

We fix a quantity 
2.8|we consider for 



Let us now fix some notations in preparation to Lemma |4. 10 



a < 1 and set i = 1 — a. As in the discussion preceding Lemma 
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s S [0, t] the restriction i>s ~ iGi,)#\s.s+a] of i' between the quantiles s and s + a. 
We adopt the same convention for i). Note the corresponding barycenter can be 
written as 



1 



(9) B{s,i^) = - / G,(t) dA(i) 



a 

Together with the representation of W by quantile functions, ^ imphes 
|B(s,i/)-B(s,z>)| < 

a 

Moreover we can prove 

without difficuhy using for instance the identity function in ([s]). Another easy 
property is 

This can be seen for instance as a consequence of the representation of W by 
generahzed inverse cumulative functions. 

Let X be an element of K and consider the subset of measures v such that 
^(0, v) < X < B{t, I'). These are exactly the measures such that there exists s G M 
satisfying B{s, v) — x\ for such v the shadow S^{S) = Vg is well defined. 

Lemma 4.10. Let 6 ~ aSx be an atom of mass a < 1. The map v t-j- S'^{d) is 
continuous on its domain of definition inside the probability measures. 

Proof. Let ly, v be probability measures in M and assume that S'^{5)^ S^{5) exist. 
Let r, s be such that Vr = S'^{5) and Vg — S'^{5). Of course both measures have the 
same barycenter. Then 

<a\Bir, B{s,iy)\ + W{iys,i^s) 

<a\B{r, iy)~B{r, z>)| + Ty(i^„ 

<W{iyr,£'r) + W{iys,i's)<'2W{iy,v). □ 

Lemma 4.11. Let S be an atom and assume S :<e "n, where rj < v. Then we have 



Proof. As explained in Example |4.7[ there exist an open interval Q of [0, such 
that S"^(/x) is G^Xq and another interval Q' such that S^{fi) = G^Xqi A rj. The 
fact that these measures have the same mass and the same barycenter, implies that 
g C Q'. It follows that 77 - Si{5) is smaller than v - on G(E \ Q'), on 

G(Q'\Q) and on G(Q). □ 

Lemma 4.12 (Shadow of one atom and one measure). Consider now (5 + 7 where 
5 is an atom. Assume {5 + 7) v . Then we have 7 :<e v — S'^{S) and 

(10) S"^(^ + 7) = S"'((5) + 5'^~^''('')(7). 
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Proof. We first prove that 7 is smaller than i/' := — S'^{S) in the extended order. 
Note that there exists an interval / such that S'^{S) is concentrated on / and i^'il) ~ 
0. Let (y9 be a non-negative convex function which satisfies linisup|^|_>.32 |(/7(a;)/a;| < 
+00. We will prove J (pd'j < J ipAv' . For that we introduce ip which equals ip on 
M \ / and is linear on /. We can assume that ij} is convex and ip > ip (even if / is 
unbounded). Note that p and t/j coincide on the border of /. We have 

pdj < J t/j < J ipdv ~ J tp dS. 

But J tpd6 is J dS''{S) because ip is linear on / and this is greater than / p dS'^{6). 
Moreover J ip dv' — J p dv' because z/' is concentrated on M \ /. It follows that 

pd'y< / Ip di/ — / tp d6 < / pdiy''. 



As in the case of the usual convex order, it is of course sufficient to test against 
convex functions of linear growth, hence 7 :<e v' ■ 



It remains to establish (10 1. It is clear (see for instance Example 2.3) that both 
sides of the equation are greater than 5 + 7 in the convex order and smaller than 
V as functions of events. Hence by the definition of the minimal shadow it follows 
S"'(5 + 7) die 5'^(J) + 5''-'5"(*)(7). The other inequality is shown as follows: we will 
prove that for 7/ (5 + 7 we have S"'(5) + (7) 

^(7 V- In fact if 77 >c (5 + 7 

then 

r; = 5"((5) + ^"-5"(*)(7) 
(note that we have already proved that all terms exist in this decomposition). But is 



follows from r] < u and r]-S'^{S) < v-S'^{5) (proved in Lemma 4.11 ) that FIJ C F!^ 



and F^-^'^*^ C F!;-^"^^^ so that S^'iS) he S''{6) and S'>-^''^^\j) he S'^-^'^^^^j) 



As in Example 2.3 the compatibility of sum and convex order concludes the proof. 

□ 

Lemma 4.13 (Shadow of finitely many atoms). Let {5i)i be a family of atoms at 
point Xi and of mass a; G [0,+oo[ (where we allow the weight ai to be 0). For 
every n> 1 let — Si + ■ ■ ■ + Sn- Assume that fin diE v. Then, we can construct 
a sequence (z^„)„gN such that 

• z^o = 

• — Vn-i + S"' "^""Hi^n) for every n > 1. 
Moreover we claim that i/„ — S^{^n)- 

Proof. The lemma is proved by induction. The basis holds with vi = S'^{5i). Fix 
n>\ and assume that Ui have been constructed for i <n and satisfy Vi = S'^{^i). 
Let us now consider Hn+i = i5i + /z^ where fj,'„ = J2"^2 ^i- A^n+i diE v we 



can apply Lemma 4.12 to the pair So /z^ can be convexly embedded in 

v' = v- S^'ibx) and 

(11) S'^{y.,,^{)^S^{h)^S^\^0 

But because of the inductive hypothesis applied to /i^ and v' , the shadow ifJ-'n) 
is equal to a measure i/^ — J2i=ii^i ~ ^i-i) where v[ — v[_i is the shadow of /i^ 
in v' — and v^^ ~ Q. It follows that for every i < n we have ly^ — vi + i^n-i- 



Let us define Vn+i as the shadow of /i„+i in v. As a consequence of (11) it equals 
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+ i^'j = i/n + ^^{5n+i)- But v' — v'^ is the same as v — which concludes 
the proof. □ 

Remark 4.14. An important consequence of the lemma above is that Vn — Vk is 
the shadow of ^in — ^'k in v — S'^lpk)- Even though the above construction is of 
inductive nature, when permuting the n first atoms, the measure — X^i ~ ^i-i 



is always the same: it is simply S"^ The same facts apply to Proposition ^.17 
below. 

Proposition 4.15. Assume that {pn)n is increasing in the convex order and 

1^ for every n € N. Then both {^n)n CLnd {S^^)n converge weakly. If we call fioo, 

respectively Soo the limits, then the measure Sao is the shadow of ^oo in v. 

Proof. On the one side, the assumptions imply it^^ < < • • ■ < and < 
u^. The limit Uoo '■— hmnsNW^ exists because for every x S M, (u;i„(a^))n is 
increasing and bounded from above. Of course the limit u^o is a convex function 
and since Uy is and upper bound it has the correct asymptotic behavior. Therefore 



^oo IS a potential function and because of Proposition HTTl it is the potential function 
of some /ioo € M with the same mass and mean as v and the /i„'s. 

On the other hand, for n G N we consider the set F'^ of measures 7y„ with 



P-n and r]n < v. (We are using the notations of the proof of Lemma 4.6 ) 

The measure S^i^ybn) is the smallest element of F^^ for the convex order. The family 
F'^^ is decreasing with n and v G F^ := Pl-^X, that F^ is not empty. It has 
also a smallest element and this equals /loo- D 

Lemma 4.16 (Shadow of one measure and one atom). Consider now 7 + 5 where 
S is an atom. Assume (7 + S) :<e v- Then we have 5 :<c S'^{'j + d) — 5''' (7) and 

(12) ^''(7 + (5) = 5'^(7) + ^''-^"'(^'('5)- 

Proof. If 7 is the sum of finitely many atoms, the property holds because it is 
possible to construct recursively 5'' (7 + 6) using a decomposition with the first 
atom of 7 as has been done in Lemma |4.13[ Let us consider an approximating 
sequence (7'-"'^)„ of 7 as in Lemma 



2.9 



We can write the decomposition of the sh adow of 7^"-' +5 in as in the statement 



of the lemma and apply Proposition 4.15 to the sequence (S''^(7*^"'))„. It follows 



that the hmit exists and equals S^ij). Write for S"'(7(")) and for S^ij). 
For the same reasons as above the shadows of 7*^"^ + S converge to S'^{-/ + S). 

We still have to show that S'^^'^'' \S) converges to S''^""* '(5). We know that 
I/" converges to 1^^°°' in so — j/'^"-' tends to v — 1^^°°^ and all these measures are 
bounded by ly (in particular they have a density smaller than 1 with respect to v). 

We also know from Example |4. 7| that S'^~'^ (S) is the restriction oi v — v'^"'^ to the 
(uniquely determined) "quantile interval" with the correct mass and barycenter. 



Rescaling masses if necessary, the continuity Lemma 4.10 implies that S'^ (<5) 



converges to (S). □ 

We are now finally in the position to prove the desired associativity property of 
the shadow mapping. 

Proof of Theorem\4.8\ If 72 is the sum of finitely many atoms, the property holds 



because by Lemma 4.16 it is possible to construct recursively 5''' (71 +72) using 



a decomposition with one atom from 72 and the rest of 71 + 72 as the second 
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measure. Let us consider a sequence 72"^ of measures made of finitely many atoms 
tfiat weakly converge to 72. Moreover we assume that (72"'* )n is increasing in the 
convex order as in Lemma 12.91 

(n) 

We can write the decomposition of the shadow of 71+72 in i/ as in the statement 



of the lemma and apply Proposition |4. 15| to the sequence (S"' ^''^'^^\'y2^^))n- We 



obtain that the limit exists and equals S'^ ^"'^'"-''il^)- For the same reasons the 
shadow of 71 + 72" ■* converges to S'^{'yi +72). This concludes the proof. □ 

Proposition 4.17 (Shadow of the sum of finitely many measures). Let (7^),^ be 
a family of measures (that possibly vanish identically). Let /i„ = 71 + • • • + 7n. 
Assume also fin v for every n > 1. There exists a unique sequence {vn)n£Vi 
such that 

• J/Q = 

• ^'n - ^'n-l = <S"^~''"-i(7„) 

Moreover we have Vn = S'^{fin)- 

Proof. The statement is the same as Lemma |4 . 1 3| except that we do not require the 
measures 7^ to be atoms. Lemma |4.13| relies on Lemma |4.12| which characterizes 
the shadow of 71 + 72 under the assumption that 71 is an atom. Substituting it 



with Lemma 4.8 the present claim follows verbatim. □ 



We now formally define the left-curtain coupling ttic that has been discussed in 
the introduction and whose properties will be derived in the sequel. It is called the 
"left-curtain transport plan" because it projects shadow measures as a curtain that 
one closes starting from the left side. 

Theorem 4.18 (Definition of ttic). Assume that /i v. There exists a unique 
martingale transport plan tt which transports fJ-]-oo.x] io S'''(/i] _oo,a;])j such that 
proj^(7r|]_oo .j,]xr) = S^{l^]-oo.x\)- ^6 denote this martingale transport plan 

by TTlc- 

Proof. Plainly, the condition given in the statement prescribes the value of 

7r(] -oo,a;] x A) ^ S"" {fi]_^^^]){A) 

ior X G M. and every Borel set A C M, thus giving rise to a unique measure on the 
product space. The martingale property is also straightforward. □ 

Remark 4.19. The same idea can be applied if O is some ordered set and {Ix)xeO 0. 
family that spans the Borel a -field. For instance we can consider O — [Q, -\-oo[ with 
Iq = [0,0], Lx — [a{x),b(x)] where a,b are continuous, a decreasing, b increasing, 
and R = [Jx^^- 

Example 4.20. In the case of a finitely supported measure fi = Y^^=i ^i' follows 
that if the ordering is done such that the support of 6i is {xi} with Xi < ■ ■ ■ < Xn, 
the TTic-coupling is ttic — X^iLi ^i® S^~'^^^^{l^i) where Si = 5i/5i{xi) are the properly 
renormalized version of Si and Vi denote the same measures as in Lemma \4.l3[ 



Theorem 4.21. The martingale ttic is left-monotone in the sense of Definition \1.4\ 
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Proof. Note that ttic is simultaneously a minimizer for all the cost functions Cs^t {x, y) 
'^]-oo ,s]{x)\y ~ A for arbitrary real numbers s and t. Indeed if tt is an arbitrary mar- 
tingale transport plan then 

// Cs,t(a;,y)d7r(a;,2/) = // d7r(a;, y) = / d(proj| 7r]_oo,s]xR)(y) 

J J JJ]-oo,5]xR J 

By the properties of the shadow mapping this property is larger than or equal to 



equality holds for all s, i e M if (and only if) tt = ttic. 



Applying Lemma 1.11 to the costs Cs,t for s,i G Q, we obtain a Borel set F^.t 
of TTic-measure 1. Set F = HstGHJ-'^s!*' claim that a configuration as in ([s]) 
cannot appear in F. Indeed if {x,y~), {x,y^) and {x',y') are in F and satisfy 
X < x' and y~ < y' < y^, they are also in Tg^t where (s,t) satisfies s g]x,x'[ 
and t G]y',y~^[. Let A e]0, 1[ be such that y' = Ay+ + (1 — ^)y~ ■ The measure 
a = A(5(2. + (1 — A)(5(2. J,-) + S(x'.y') is concentrated on F but the competitor 
a' — A(5(2,/ y+) + (1 — A)(5(j./^y-) + S(^x.y') leads to a lower global cost. This yields the 
desired contradiction. □ 



5. Uniqueness of the monotone martingale transport 

In this section, we establish that the left-curtain coupling ttic is the unique mono- 
tone martingale coupling. Our proof is adapted to our specific to the setup. We 
will also explain a more classical argument that is often invoked in the optimal 
transport theory to establish some uniqueness property. This so called half sum ar- 
gument will be used several times subsequently but requires the initial distribution 
fi to be continuous. 

We start with three preliminary lemmas which are required to derive the main 
result of this part, Theorem |5.4| 

Lemma 5.1. If ii -<c v then one of the following statements holds true: 

• we have /i(]a, +oo[) > for every a; 

• there exists a G M such that /-i(]a, +oo[) = and J^(]a, +oo[) > 0; 

• there exists a G M such that /^(]a, +oo[) = = v{]a, +oo[) and J^({a}) > 
^({a}). 

The corresponding property for interval ] — oo, b[ is true as well. 

Proof. If we are not in the first case, there exists a point a such that /i(]a, -l-oo[) = 0. 
Let us take the smallest of these a, i.e. the supremum of the support of /i. Integrat- 
ing X <-> (x — a')+ for different values of a' < a we thus obtain sup(spt(z^)) > a. If 
this inequality is strict we are in the second case. If there is equality let's prove that 
we are in the third case: if fi{{a}) = we are done. If n{{a}) > 0, the conditional 
transport measure tt^ must be the static transport because sup(spt(i^)) — a. Hence 
the third case applies. □ 

We recall from Section 4 that that if ii diE v then denotes the set of measures 
r\ such that ix -<c r\ and r\ < v. As a consequence of Lemma |5.1| we have the 
following: 
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Lemma 5.2. Let fi, v and vi he finite measures and assume that fi v. If 
there exist r/ G Fj^ and d G M such that rj is concentrated on] — oo,d] and V2 is 
concentrated on [d, +oo[ then the shadows S'^^'^^{fi) and S'^{fi) are equal. Both are 
concentrated on ] — 00, d] and S'^{fi){{d}) < r]{{d}). 

Proof It is clear that S''+''^{^J.) <c S^in). We have also S'^+'^'in) <c V so that we 



can apply Lemma 5.1 to this pair. We are clearly not in the first situation because 
the support of i' is bounded on the right. Hence the assertion of the second or the 
third case applies. In either case ry and S^^^^di) are concentrated on ] —00, d]. The 
shadow is smaller than {v + V2)\]-oD.d] = v + V2{{d})6d and considering carefully 
the third case, S"^+'^2(^) < v. Finally we obtain S''+''^{^l) = S^i^f). □ 



For every pair {u,v) ,u < v let g^^ be defined by 
(13) 9u,v{x) 



V — X a X £ [u,v] 
otherwise. 



Lemma 5.3. Let a be a non trivial signed measure of mass and denote its Hahn 
decomposition by a = — . There exist a G spt(cr^) and b > a such that 
J ga,bix) da{x) > 0. 

Proof. First notice that u 1— )■ J gu.u+iix) da{x) does not vanish identically. Since 

gu,u+i{x)dcr{x)du = 0, 



there exists a G M such that / ga,a+i{x) da{x) > 0. The set spt((T+ n [a, a + 1)[ can 
not be empty, so let b = min(spt(cr+ n [a, a + 1]). It follows that 

0< J ga,a+ida < J ga^bda. □ 

Theorem 5.4 (Uniqueness of the monotone martingale coupling.). Let it be a 

monotone martingale transport plan and /i = proj^ tt and v = proj^ tt. Then tt is 
the left-curtain coupling ttic from fi to v . 



Proof. Let tt be left-monotone with monotonicity set F as in Definition 1.4 and let 
TTic be the left-curtain transport plan between [l and v. We consider the target 
measures v"^ and v'^'"= obtained when transporting the /i-mass of ] — 00, x\ into i^, 
i.e. 

vl = Proj^7r]_oo,a;]xB 

and 

vl'- = S"'(/i]^oo,x]) = proj|7ric]_oo,x]xR- 
If v'^ — v^""^ for every x then vr = vric by the definition of the curtain-coupling in 
Theorem glSl 

Assume for contradiction that there exists some x with ^ i/J''= . This means in 
particular that = {ly'^^" — f^) ^ 0. The shadow property implies that v'^^" vlf.. 



By Lemma 5.3 we can pick u G spt((T+) and v > u such that / gu,vdax > 0. As 



u G spt(T^, there is a sequence {x'„,Un)n such that 



• «,-«„) e F 
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As TT is monotone, for every t < x and n G N, the set can not intersect ] — (X), ii„[ 
and ]u„, +oo[. Hence for t < x, 

(14) rtn]-oo,u[=0 or rin]u,+oo[= 0. 

This remark will be important in the sequel of the proof. 

We distinguish two cases depending on the respective positions of u and x. 

(1) First case: u < x. Note that we have 



and 



As a consequence of (14), tt transports the mass of ] — oo,it] to ] — oo,m] 



and the mass of ]u, x] to [u, +oo[. Let us prove that the same applies to ttic 



First observe that by Lemma 5.1 ttic transports the mass of ] — oo, m] to ] — 
oo, u] . Second, as it is possible to transport the mass of ]u, x] into V[u,+oo[ ~ 
cr„({u})(5„, for instance with the shadow projection S'^~^^ {^^\u,x\) = ^x~'^u^ 
the shadow of ij.]u.x] in — S'^ (fi-^^^ .^^) = v — v^^" is equal to the shadow 
of M]ti.a;] in ^[u,+oo[ Or in V — v^. In particular this measure is concentrated 
on the set [u, +oo[. 

Finally we have i^J''= :<c i^Z on the left side of u and (i/J'"^ — 
(i/J — v^) on the right side of u. Note that gu,v is convex on [u, +oo[ so that 
/ 9u,v <iiK'" - K'") ^ / 9u.v d{Vx -K)- The function is not convex on 



00, w] but / gu^vdv^'" < J 

9u,v(^^u due to Lemma 5.1 1 Summing these 
inequalities we obtain a contradiction to J gu.v dz^'J < jgu,v 'i'^x'"- 
(2) Second case: u > x. The measure tt can not transport mass from ] — c», x] 
into ]u, +oo[. Indeed because of the martingale property it then would also 
transport mass to the set ] — oo,u[, contradicting ( |14[ ). Applying Lemma 
K^'"^ is concentrated on the left of u and J gu,v di>x ^ J 9u,v d^'J''' , which 



5.2 



is a contradiction to / g^^v dz^^ > 0. 

□ 

Remark 5.5. The two cases in the proof are actually not very different. In both 
of them, 7r]_oo,a:]xE o.iT'd Tric]-oo.x]xS. (roughly speaking the transport plans restricted 
to IJ-]-oo,x]) OLf^ concentrated on 

(] — oo, m] x] — oo, u]) U (]it, +oo[x [u, +oo[) 

and this is the core of the argument. 

5.1. Structure of the monotone martingale coupling. It remains to establish 
Corollary |1.6| which states that if is continuous, then it is concentrated on the 
graph of two functions. We need the following lemma. 

Lemma 5.6. Assume that F C is a Borel set such that for each x £ M. we 
have \Tx\ < 2. Then S = proj^(F) is a Borel set and there exist Borel functions 
Ti , T2 : 5* — > M with Ti < T2 such that 

F = graph(ri)Ugraph(r2). 
Proof. This is a consequence of [TTJ Theorem 18.11]. □ 
We can now complete the proof. 
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Proof of Corollary Consider the left-curtain coupling ttic between measures 
fi :<c V, where is continuous. As ttic is left-monotone there exists a Borel mono- 



tonicity set F as in Definition 1.4 Note that if ^.{A) = 0, the set F \ (yl x M) is 
still a monotonicity set. This applies in particular to all countable sets since is 
continuous. 



With the notations of Lemma 3.2 let us show that A = {a; € M : |Fa:| > 3} is 
countable. If not, wc can apply this lemma and obtain a; G K with three points 
< y < in the set T.^ that can be approximated from the right side. In 
particular there exists {x'^y') € F with x' > x and y' (z]y^,y^[, which is the 
forbidden configuration ([3]). Therefore A is countable so that we can assume that 



Fj:| < 2 for every x. Applying Lemma 5.6 we obtain the desired assertion. □ 



The following lemma permits to obtain uniqueness of the optimal martingale 
transport plan, provided that the we know that every optimal martingale transport 
is concentrated on the graphs of two mappings (see Section [t]). We can apply it to 
the martingale transport plans when fi is continuous and recover the uniqueness of 
the monotone transport plan in this particular case. 

Lemma 5.7. Let /i and v he in convex order and £ a non-trivial convex set of 
martingale transport plans. Assume that every tt G £ is concentrated on some 
C with |FJ| < 2 for every a; € M. Then the set £ consists of a single point. 

Proof. Let tt and tt' be elements of £. We consider vf = e £ and F'^, that can 
be seen as the graph of two functions according to Lemma |5.6[ The measures tt and 
tt' are also concentrated on F'^. For two disintegrations {'!Tx)xeM. and (7r^)a;gM with 
respect to /i, we know that //-almost surely and tt^ are probability measures 
concentrated on F^ and with the same barycenter, namely x. it follows tt^ = t^x, 
/^-almost surely so that tt' = tt. □ 



6. OpTIMALITY PROPERTIES OF THE MONOTONE MARTINGALE TRANSPORT 

In this section we prove that ttic is the unique optimal coupling for the martingale 
optimal transport problem ^ associated to two different kinds of cost functions. 
The special case c{x,y) — exp(?/ — x) is in the intersection of these two families of 
cost functions. 

Theorem 6.1. Assume that c{x,y) ~ h{y — x) for some differentiable function 
h whose derivative is strictly convex and that c satisfies the sufficient integrability 
condition. If there exists a finite martingale transport plan, then ttic is the unique 
optimizer. 

Proof. We have to show that every finite optimizer tt is monotone. Pick a set F 
such that 7r(F) = 1 and F resists improvements by barycenter preserving reroutings 



as in Lemma 1.11 Pick {x,y^), {x,y~^), {x' ,y') £ F. Striving for a contradiction 
we assume that they satisfy ([3|. Let us define a transport a on these edges and 
a competitor a' of it. We pick A e]0, 1[ such that Ay+ + (1 — X)y^ = y'. The 
measure a puts mass A on {x,y~^), mass 1 — A on {x,y~) and mass 1 on {x' ,y'). 
Our candidate for a' will assert mass 1 — A on (a;', y~), mass A on (a;', y'^) and mass 
1 on {x,y'). Clearly a' is a competitor of a. It leads to smaller costs if and only if 

Ac(a;, y+) + (1 - A)c(x, y-) + c{x' , y') > Ac(x', y+) + (1 - A)c(a;', y-) + c(a;, y'). 
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A sufficient condition for this is that 

(15) d{x) = Ac(x, y+) + (1 - \)cix, y-) - c(a;, y') 

is strictly decreasing in x. In terms of h the function d can be written as 

d{x) = Xh{y+ -x) + {l- X)h{y- - x) - h{y' - x). 

To have it decreasing it is sufficient that 

> d'{x) = ~Xh'{y+ - x) - (1 - X)h'{y~ ~ x) + h'{y' - x) 

= h'{X{y+ - x) + (1 - X){y- - x)) - [Xh' {y+ - x) + {I - X)h'{y- - x)]. 

Finally it is sufficient to know that h' is strictly convex which holds by assumption. 

□ 

We mention another class of cost functions for which the monotone martingale 
transport plan ttic is optimal. 

Theorem 6.2. Letijj be a non-negative strict convex function andf a non-negative 
decreasing function. Consider the cost function c{x,y) = ip{x)ip{y) > 0. For two 
finite measures fj, and v in convex order the left-curtain coupling ttjc is the unique 
optimal transport. 

One could show that optimal martingale couplings are monotone in a very similar 
way as the proof of Theorem |6.1| We prefer to give an alternative proof relying on 
the order properties of the left-curtain coupling. 

Proof. Let tt be optimal for the problem and assume JcdTr < -\-oo. We want to 
prove / c dTTic < J cdir with equality if and only if tt = ttic . First of all note that 
for positive measurable functions / 

f{x) ip{x) d^i{x) = ^ (^J l]-oo,^-i(t)]/(^) dAi(a;)^ dt 

where ip~^(t) means sup{x G M : < < ip{x)}. Taking f{x) — J "(/"(y) d7r^(y) we 
obtain 

(16) J c{x,y)d7T{x,y)^ £ (^J i^{y) diy^^.^^^iy)^ dt 

where denotes proj^ '^]-co,u] as in the introduction or in Section|5j In particular 



v^'" equals >5''^(/i]_oo,n])- Of course the representation ( 16 1 remains true if we replace 
all occurrences of tt by ttjc. 

The measures 1^^'"= and are in convex order and is strictly convex so that 
J ip di'J''= di'J with equality if and only if the two measures are the same. 



Finally it follows from ( 16 1 that tt is the left-curtain coupling. □ 



7. Other cost functions - other optimal martingale couplings 

In this section we use Lemma [l.ll| to derive results that appeal to general cost 
functions. 
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7.1. Cost functions of the form c{x,y) ~ h{y — x). 

Theorem 7.1. Assume that the cost function c{x,y) is given by h{y x) for some 
function h which is twice continuously differentiahle. If affine functions x ^ ax + b 
meet h' {x) in less than k points and tt is an optimal transport plan, then there exists 
a disintegration {'!Tx)xe'B. such that for any x € at least one of the two following 
statements holds 

^J.{{x}) > or card(spt(7r2:)) < k. 

In particular if fi is continuous then card(spt(7ra;)) < k is satisfied ^-almost surely 
for any disintegration ofi:. 



Proof. Let tt be optimal and F according to Lemma Lll If there are only countably 
many continuity points of fi such that ca,Td(Tx) > k + 1, then we can remove them. 
Assume for contradiction that there are uncountably many. 
Applying Lemma [3. 2| to the set 



T^{{x,y)er:fi{{x})^0} 

we obtain a G M and bo < . . . < b^ E Ta- 

Let a' e M, A e ]0, 1[ and set bx = {I — X)bo + Xbk- We will compare 

(17) h{bx -a) + Xh{bk - a') + (1 - X)h{bo - a') 
and 

(18) h{bx - a') + Xh{bk - a) + (1 - A)/i(6o - a)- 

As a' tends to a, bi — a' tends to 6,; — a. Considering a Taylor expansion of h at 
bi ^ a we find some e > such that |a ~ a'| < e implies 

\[h{b^-a')-h{b, -a)]-h'{bi^a) x (a-a')| < \h" {b^ - a)\{a - a')^ 



for i £ {0, A, fc}. Hence if we subtract (17) from (18) we obtain 

(19) {h'{hx -a)- [(1 - X)h'{bo - a) + Xh'{bk - a)]) {a' - a) 

up to an error of 

[(1 - X)\h"{bo -a)\ + X\h"{bk -a)\ + \h"{bx ~a)W-{a- a'f. 

But h' is not linear so that ( [T9| is not identically zero. Moreover according to the 
assumption on h' and the affine functions there is an index i £ {1, . . . , fc — 1} such 
that if bx = bi and a' ^ a then ( 19 ) is not zero. More precisely as h" is continuous 
there exists some ei < e such that if \bi — bx\ < e\ and < |a — a'| < e\ then 
the difference of (17) and (18 1 is not zero and its sign is determined by the one of 
a — a'. 

Since a, 6o,...,6fc were chosen according to Lemma 3.2 we may pick al and 
S Fq/ such that (a', 6a) is sufficiently close to (a, hi) and a! is on the correct side 
of a, making (17) smaller than (18). 
Setting 

a = + (1 - A)(5(<j,b„) + (5(a',b;,), 

a' = Af^Ca'.bfc) + (1 - X)8(^a'M + '^(Q:b)' 

we have thus found a competitor a' which is has lower costs than a, contradicting 
the choice of F. □ 
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7.2. The cost function h{y — x) in the usual setup. It seems worthwhile to 
mention the following variant of Theorem 1 7 . 1 1 1 hat is part of the classical theory of 
problem Q, even if we are not aware that it has been noticed in the literature in 
this form. In fact for a family of special costs we can bound the number of parts 
the mass can split in while it is transported optimally. Note that this number is 
not attained for every pair {pL,v) (see [23 ). The similarity with Theorem 1 7 . 1 1 relies 
on the fact that we want to count the number of intersection points of graph(/i') 
with affine lines on the one hand, and with horizontal lines on the other hand. 

Theorem 7.2. Let k he a positive integer and let h : R ^ R be a twice continuously 
dijferentiable function such that the cost function c : {x,y) i~> h{y — x) satisfies 
the sufficient integrability condition with respect to probability measures fj, and v. 
Assume also that C{^^i') < +oo. 

// the equation h' {x) — b has at most k different solutions for b € M., then there 
exists a disintegration {nx)xeR such that for x Cz R at least one of the two statements 

fi{{x}) > or card(spt(7r2;)) < k 

holds. In particular if fi is continuous then card(spt(7r2;)) < k is satisfied ^-almost 
surely for any disintegration. 

7.3. (Counter)examples based on the cost function c{x,y) = {y — x)*. In 
this section we give two counterexamples that distinguish the general behavior from 
the one of the curtain transport plan: the optimizer is in general not unique and 
it may very well split into more than two parts even if the starting distribution is 



continuous (see Corollary 1.6 resp. Theorem 7.1). Throughout this subsection we 
consider the cost function c{x, y) — {y — x)"^ . 

7.3.1. Example of non uniqueness of the transport. Let /i be uniformly distributed 
on {— 1; 1} and v uniformly distributed on {— 2;0;2}. We denote —1 and 1 by 
{xi)i=i^2 and —2, and 2 by (jjj)j=i,2.3- To any matrix A — (aij) of two rows and 
three columns satisfying a^j- 1/2 and Oij — 1/3 we associate the transport 
plan defined by 7r({(a;i, — aij. For such a transport plan the accumulated 
costs equal 



^ aij -{x,- y-j = (ai,i + ai^a + ^2,2 + ^2,3) + 3" • (01^3 +02,1) 



= 1 + 80(ai,3 +a2,i). 
The matrices associated to a martingale transport plan are 

1/4 1/4 \ ^^-1/12 1/6 -1/12 



1/12 1/12 1/3 ) \ 1/12 -1/6 1/12 

where A G [0,1]. Therefore the martingale transport plan associated to the pa- 
rameter A gives rise to total costs of 1 + 80(1/4 - A/12 + 1/3 + A/12) = 143/3, 
independently of A. We conclude that every martingale transport plan is optimal. 

7.3.2. Example of splitting in exactly three points in the continuous case. Roughly 



speaking we have proved in Theorem 7.1 that if /i is continuous, d/Lt(a;)-mass ele- 



ments split in at most three points. Indeed t H> i'' has derivative t 1— )■ At^ which 
is of degree 3. In this paragraph we give a numerical example showing that that 
this upper bound is sharp. The construction is inspired by the dual theory of the 
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martingale transport problem mentioned in Paragraph |2.4| Briefly, Figure |4] de- 
picts a family of curves indexed by x. These curves touch three envelope curves 
at three moving points yi, y2 and j/3 close to —1, and 1. The optimal martin- 
gale transport plan that we construct is supported by the union of the graphs 
Ti = {{x,y^{x)) e ]R2 I 2; e]0, l/5[} for i = 1,2,3. Let V : K -> K be defined by 




Figure 4. Graphs and envelope of the functions y F{x, y) for 
a; e [0,1/5]. 



(20) V(y)=y^- max (4x(2/+^)(y+l-a;))(y-l-x))|. 

Hence for any (x, y) £ [0, 1/2] x M 

y"^ - ip{y) > Axy^ - 6x^y'^ + ai{x)y + 61 (a;), 

where ai{x) — Ax — 4x^ — Ax^ and bi{x) = 2x^ — 2x*. But y^ — {y ~ x)'^ -f Axy^ 
Gx^y"^ + a2[x)y + 62(0;) so that 

(21) {y~xf>a:,{x) + b:,{x)y + i^[y) 



for = ai — a2 and 63 = 61 — 62- Here (21 ) is an equality at the point {xq, j/q) if 
and only ipiuo) is realized in (20) by a; = xq. Integrating (21) against a transport 
plan TT one obtains 

{y- x)'^dTT{x,y)> J a^ix) dfi{x) + j j b^{x)y (lT:{x,y) - j %l:{y) dv{y) 

and the equality holds if and only if tt is concentrated on 

{(x, y) e [0, 1/2] X M, (y - xf = 03(2;) -f b3{x)y + ^{y)}. 
Moreover as we are considering a martingale transport plan we have 

{y - x)^ <1-K{x,y)> a3{x)dfi{x)+ b3{x)x dfj,{x) + ip{y)dv{y). 



Here the lower bound on the right-hand side is the same for every martingale 
transport plan tt. It follows that martingale transport plans concentrated on 
{(x,y) G [0, 1/2] X M, (y — x^ = a^lx) + b'i{x)y + ip{y)} are optimal with respect 
to their marginals. We set F{x, y) — Ax{y +^){y + 1 — x){y — 1 — x) so that (20 1 is 
'^{y) = v'^ ~ sup^g[o_i/2] F{x,y). In Figure W[ one can see the graphs of F{x, ■) for 
values of x between and 1/5. 
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We will prove that for y G ] — 1, 0[U ]1, 2[, F(-, y) : [0, 1/2] — > M has a unique global 
maximum in ]0, l/2[. Actually F{-,y) has main term 2x'^. Therefore it is sufficient 
to prove that dxF{-,y) is positive for a; = and negative for x = 1/2. Indeed this 
means that we are analyzing the variation of the polynomial function F{-,y) of 
degree 4 on an interval where its variations are different from the asymptotic ones. 
In particular F{-,y) will have a unique maximum on ]0, l/2[. This turns out to be 
true. Indeed 

(22) dxF{x, y) = 4 {{x + y)[{x - yf - 1] + x{x + 2y){x - y)) , 

so that for any parameter y in ] — 1,0[U]1,2[, the function dxF{-,y) is positive in 
X ~ since it equals y i— > 4 (y(?/^ — 1)). For x = 1/2, straightforward considerations 
show that dxF{l/2, y) is negative for all y e ] — oo, 2]. 

We will now show that for a given parameter x g]0, 1/5[, x is the maximum of 
F(-^y) on [0, 1/2] for exactly three elements y of] — 1,0[U]1,2[. For this purpose 
we consider y k-> dxF{x,y). We prove that it vanishes exactly three times on 
] — 1,0[U]1,2[. For fixed x e]0, l/5[ this function is indeed negative in and —1 
while it is positive in —1/2. The sign is also different for y = 1 and y = 2 so that we 
have found the three zeros of y i— >• dxF{x, y). But as explained in the previous step, 
for y s] — 1,0[U]1,2[ being a maximum of F{-,y) is exactly the same as having 
zero derivate. 

Therefore any x e]0, l/5[ gives rise to the maximum of F{-,y) for three different 
y £ [—1, 0] U [1, 2]. Hence there are yi, y2, ys such that ip{yi) — yf — F{x, yi) for i — 
1, 2, 3. Notice that x is in the convex hull of these points because yi is close to —1, 
y2 is close to and ya close to 1. Hence there exists a martingale transport plan tt 
concentrated on [0, 1/5] x ([—1, 0]U[1, 2]) such that is supported on {yi, y2, 2/3}(a;) 
with positive /i-probability. Moreover it follows from the explanations above that 



this martingale transport plan is optimal. Namely (20) holds tt almost surely. 
Hence we have proved that the bound /c = 3 of Theorem |7.1| is sharp in the case 
c{x,y) = {y-xY. 

7.4. The Hobson-Neuberger cost function and its converse. As mentioned 
in the introduction, Hobson and Neuberger [T5] study the case c(x,y) = — |y — x\, 
motivated by applications in mathematical finance. They identify the minimizer 
tthn based on a construction of the maximizers for the dual problem. Here some 
conditions on the underlying measures are necessary; an example in ^ Proposition 
5.2] shows that the dual maximizers need not always exist. Based on Lemma [l.ll| 
we partly recover their result. Throughout this part we will only deal with the case 



of a continuous starting distribution /i (see Remark 7.6 on this hypothesis 



Theorem 7.3. Assume that fi and v are in convex order and that fi is continuous. 
There exists a unique optimal martingale transport plan tthf for the cost function 
c{x,y) ^ -\y - x\. 

Moreover, there exist two non- decreasing functions Ti,T2 '■ M — > M such that 
Ti (x) < X < T2 (x) and tthn is concentrated on the graphs of these functions. 

A similar behavior holds for the cost function c{x,y) = \y — x\ built on the 
absolute value h : x ^ \x\. We have learned about the structure of the optimizer 
for this cost function form D. Hobson and M. Klimmek [Ti]. Recall that F^. = {y : 
{x,y) e F} for F C M? . 
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Theorem 7.4. Assume that fi and v are in convex order and that /i is continuous. 
There exists a unique optimal martingale transport plan TTabs for the cost function 
c{x,y) = \y- x\. 

Moreover there is a set T such that TTabs is concentrated on T and \Tx\ < 3 for 
every x Cz R. More precisely, tt can be decomposed in iTstay + where TTstay = 
(Id® Id) ^(/^ A v) (this measure is concentrated on the diagonal ofM?) and tt^o is 
concentrated on graph(Ti) U graph(r2) where Ti,T2 are real functions. 



The "combinatorial core" of the proofs to Theorem |7.3| and Theorem |7.4| is 
contained in the following lengthy but simple lemma. 

Lemma 7.5. Let x,y^,y,^ ,y' £ M such that y^ < y' < y^ . Pick A such that 
Xy^ + (1 — X)y^ = y' . For x' CzR we want to compare the quantities 

A := \\x-y+\ + {l-X)\x-y-\ + \x'^y'\, B := \\x' -y+\ + {l-\)\x' -y-\ + \x-y'\. 

(1) Assume that y' < x. Then there exists xq such that {A — B) seen 

as a function of x' exactly vanishes at xq and x, is strictly positive outside 
[xq,x] and strictly negative in ]xo,x[. 



x' 


-oo y 


xo y' X 


+00 


{A^B){x') 


+ - + 



(2) Assume that y' > x. Then there exists xi Gjy', such that {A — B) van- 
ishes if x' e {xi,x}, is strictly positive outside [x,xi\ and strictly negative 
in ]x, xi [. 



x' 


—oo 


X y' xi y^ +00 


{A-B){x') 


+ - + 



(.3) Assume that y' = x. Then A ~ B is non-negative and vanishes exactly in 

X. 



x' 


-oo y 


X = y' y^ +00 


{A-B){x') 


+ + 



Proof. Consider the function 

f{t) = \\t~y+\ + {l-\)\t-y+\-\t-y'\. 

Then A> B is equivalent to f{x) > f{x') and A — B is equivalent to f{x) — f{x'). 

The behavior of the function / is easy enough to understand. On the intervals 
] — oo,?/~], [y''",oo[, the function is zero. On the interval [y~,y'] it increases linearly 
from to 2A(1 — A)(y+ — y^). On the interval [y',y^] it decreases linearly from 
2A(l-A)(y+-y-) to 0. 

The above assertions are simple consequences of this behavior. Moreover it is 
easy to calculate xg, xi explicitly. For instance in the case y' < x pick t G ]0, 1[ such 
that x^y' + t{y+ - y'). Then Xq = y' + t{y- - y'). □ 

Proof of Theorem \7.3\ Pick F according to Lemma [l . 1 1 1 and (a;, y~), (x, y^), (x', y') G 
F, with y^ < y' < y^ . Then it cannot happen that 

(23) y <x < x or x < x <y . 



Indeed choosing A e]0,l[ and a resp. a' as in the proof of Theorem 6.1 we find 
that an improvement is possible if 

-A|x - y+l - (1 - A)|x - y-| - |x' ~ y'\ > -A|x' - - (1 - A)|x' - - |x - y'\. 
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But this inequality holds in the just mentioned cases by Lemma |7.5| 

Consider the set A of points a such that contains more than three points 
and assume by contradiction that this set is uncountable. According to Lemma 



3.2 there is an accumulation effect at some a ^ A together with b , 6, 6+ G Fa in 



the order b < b < 6" *". (Without loss of generality one may assume & < a.) In 



particular Lemma 3.2 provides (aoi^o )j(^0;&o) G T such that a < aq < 6q and 



fep < 6. We have settled the first forbidden situation of (231 for {x,y^) = (ao,fco ), 
{x,y^) = (ao,6(|) and {x\y') = {xo,b), which provides the contradiction. Hence A 
is countable and n{A) — 0. It follows that one can assume \Ta\ < 2 for every a e M. 

We may thus assume there exist Ti and T2 from proj^(r) to M such that — 
{ri(a::),r2(a;)} where Ti(x) < a; < T2{x) for ^-almost every x € proj^(r). It 
remains to show that Ti and T2 are monotone. Let x,x' G M. with x < x' . We 
necessarily have T2{x) < T2{x') since the opposite inequality leads to the second 



forbidden inequality in (23) taking y = Ti{x),y' = T2{x') and y"*" — T2{x). The 
monotonicity of Ti is established in the same way. 

It remains to show that the optimizer is unique. Due to the linear structure of the 
optimization problem the set of solution is convex. Hence Lemma |5 . 7| applies . □ 

Remark 7.6. If ^ is not continuous, there may be more than one minimizer. This 
is the case for example if ^ and v are chosen as in Paragraph ] 7.3. 1\ In fact ifh is an 
even function then for the cost function c[x,y) — h[y~x) (e.g. x 1— > — \y~x\) every 
martingale transport plan is optimal. Hence it seems that it is not directly possible 
to define the Hobson-Neuberger transport plan for a general jj, in an unambiguous 
way. 

Proof of Theorem \7.4\ Let tt be an optimal martingale transport plan. Pick F 
according to the Lemma [l . 1 1 1 and {x,y^), {x,y^), {x',y') S F, with y^ < y' < 
Then it cannot happen that 

(24) x'<x<y' or y' < x < x' or x'^[y~,y'^]. 



Indeed choosing A G ]0, 1[, a and a' as in the proof of Theorem 6.1 above we find 
that an improvement of a by a' is possible if 

\\x - y+l + (1 - \)\x -y-\ + \x'- y'\ > A|x' ~ y+\ + {1 ~ \)\x' ^ y-\ + \x - y'\. 
And indeed this inequality holds in the just mentioned cases by Lemma[7.5| Note in 



particular that one of the forbidden cases of (24) occurs if x 7^ a;' and x = y' . This 
will be crucial in the following argument which establishes that as much mass as 
possible is transported by the identity mapping. (Roughly speaking the following 
is forbidden: some mass goes from x to y~ and while some mass goes from x' 
to y' — X.) 

Set TTo = ttIa, where A is the diagonal {{x,y) e : x = y} and tt = tt — ttq, 
let p be the projection of ttq onto the first (or the second) coordinate. As p < /i 
and p < v, we have p < ^ /\ v. We want to prove that p — p A v, i.e. ttq is 
(Id^J Id);#(/i A v). Let us define the reduced measures p, — p — p,v = v — p and 
K = p A V — p. Note that tt g IIm{p, v) and that 7f is concentrated on F = F \ A. 
Hence we have the following 

• For /i- almost every a there exist b~ and 6"*" such that a and 
(a,5-),(a,5+)ef. 

• For K-almost every b there exists some a ^ b such that (a, 5) G F. 



OPTIMAL MARTINGALE TRANSPORT PROBLEM 



35 



As K < /i we conclude that K-almost every real number satisfies both of these 
conditions. Thus for K-almost every x there exist , and x' such that the points 
{x,y~), {x,y^) and {x',x) are included in T and one has x' ^ x and x €]y~, ?/'''[• 



This coincides with one of the forbidden situations of (24 1. Hence k has mass 
and ttq — (Id (g) Id)#(/i A v) as claimed above. 

Our next goal is to establish that, removing countably many points if necessary, 
we have jFj,! < 2 for every a; G M. Indeed if this is not true, then there exist a and 



h < 6 < 6+ e Fa to which the assertion of Lemma 3.2 applies. We know that 



b < a or a < b; assume without loss of generality that a < b. But then there exist 



a\b^ < a' < a and b',a < b' < b such that {a',b') G This contradicts (24) 
(with x ^ a, y^ = b^ , = b^ , x' ~ a' ,y' ^ b'). 

It remains to establish that there exists at most one optimizer. For optimal 
transports tt the static part ttq = ttIa is identically (Id (8" Id) ^(/i A v). Hence the 
reduced measure tt = tt — ttq is a minimizer of the martingale transport problem 
between fi = fj, — fi A v and 9 = v — ^ /\ v. Note that pL /\ D = Q so that the 
optimal martingale couplings are concentrated on two Borel graphs. We conclude 
by Lemma |5.7| □ 



Remark 7.7. Exactly as in Remark \ 7. the hypothesis that ^jl is continuous is 
needed to prove uniqueness of the optimizer; TTabs is not well-defined otherwise. 



8. Appendix: A self-contained approach to the variational lemma 

In this appendix we provide a self-contained proof of the variational lemma 
(Lemma established in Section [s]). Indeed we obtain a somewhat stronger 

conclusion in Theoreml8.4lbelow. The benefit of this second version is that Theorem 



8.4 does not rely on the Choquet's capacability theorem and that the new approach 
provides an explicit set F. A drawback is that we have to assume that the cost 
function is continuous. Compared to the approach given in in Section [3] another 
disadvantage is that the argument does not seem to be adaptable from M x M to 
more general product spaces. 

8.1. Preliminaries based on Lebesgue's density theorem. Our aim is to es- 
tablish Corollary |8 . 3| which may viewed as an avatar of Lemma [3. 2 [ the uncountable 
set of points a being replaced by a set A of positive measure. We start with the 
well-known Lebesgue density theorem. It asserts that for an integrable function / 
on [0, 1] we have 

(25) \im^ \f{s)-f{t)\ At 

for almost every s £ ]0, 1[. In sloppy language, almost every point is a "good" point. 
Those points will be called regular points of /. In those regular points s we also 
have 

(26) liin / \f{s)-f{t)\ dt = 

for every sequence (Af„) of measurable sets satisfying Af„ C [s ~ En, s + e„] with 
bounded from below and e„ — > 0. Special admissible choices are M„ — [s, 6„] 
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or ]s, bn] and A/„ — [an, s] or [a„, s[. A particular consequence of (26l is that 
(27) 1™ WTT / /Wdi = /(5). 

Intervals B =]q, q'] or ] — oo, q'] with q.q' E QU {— oo, +00} will be called rational 
semi-open intervals. 

By Fubini's theorem, (26) implies the following result. 

Lemma 8.1. Let tt be a probability measure onRxM. with first marginal A[o,i] • Fix 
a disintegration (7ra;)a;g[o,i] • There exists a set R C [0, 1] of X-measure 1 such that 
for s G R, any rational semi-open interval B and any two sequences (a„)„, (6„)„ 
satisfying an,bn ^ s as well as On < s < bn or an < s <bn, we have 

lim / " \TTt{B) - TTs{B)\dX{t) = 0. 

n^oo bn - On J a„ 

We now extend this lemma to the case where the first marginal of tt is a general 
meaure n, not necessarily equal to A[o,i]. Recall that denotes the inverse cumu- 
lative function of fi. The measure /i can then be written as (G'^)^A. The map 
is increasing on [0, 1] and hence continuous on the complement of a countable set 
D. In particular /x(G^(I?)) = X{D) = 0. Consider a random variable {U, 0^(11), Y) 
on [0, 1] X M X R such that the law of [/ is A and the law of (G^(t/),y) is tt. Let 
TT be the law of {U,Y) and (7rs)sg[o,i] a disintegration with respect to A, i.e. tt^ 



is the conditional law of Y given the event {U = s}. Apply Lemma 8.1 to this 
disintegration of tt to a obtain a set R. Let 5 C M be the set G^(i?\ D) and let us 
call S the set of regular points. Note that this set may depend on the disintegration 
of TT and that fi{S) — 1. 

Lemma 8.2. Let tt be a probability measure on with first marginal ji and {iTx)xi^r 
a disintegration o/tt. There exists a set S QM. of measure /i(S') = 1 satisfying the 
following: for any x G S and any rational semi- open interval B the limit 



1™ / \^t{B)-n,{B)\Ati{t) 



is zero for any sequence Nn — [x — Sn, x + £„] with e„ \. 0. // moreover pL{x) = 
then the sequences Nn x + £„] and Nn = [x — £„, x[ are also admissible. 

Proof. We note that if the statement of the lemma holds for one particular dis- 
integration of TT, then it automatically carries over to any other disintegration. 
Therefore we will consider a disintegration of tt which is convenient for the proof. 
Let S and tt be as in the discussion preceding Lemma |8.2| and set for a; € M 

(28) TT, = 1^^"'"^ ifM(W)-0, 

[Kh) ■fG-\{x})^sds ifM(W)>0. 

Let X be a point in S and Nn a sequence as in the formulation of the lemma. Note 
that we have G^oF^{x) = x. Set M„ = G~^{Nn). Then A(Af„) = /i(7V„) > since 
x — Gp(s) for some continuity point s of G^. By definition of {t:x)x we have 

(29) [ \n,{B)-^^{B)\dpi{t)^-^ j \^,{B)~n^{B)\dX{s) 

and these quantities tends to provided that we can apply Lemma |8.1| to the 
sequence M„. We distinguish two cases. 
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• Assume that /i({a;}) = 0. In this case x has a unique pre-image s that 
satisfies G^{s) — x and a.s s ^ D the functions and are continuous 
in X and s respectively. Recall that TT^iB) = tTs(B). Let us first assume 
that Nn = [x- en,x + e„] with e„ I 0. Then M„ = G~^{Nn) = [Ffj,{{x - 



e„)~), Ff^{x + £„)]. As is continuous in x we can apply Lemma 8.1 As 
^{x) = we may replace the sequence of intervals ]x, x+Sn] by Nn — [x, x + 
En]- Then we can conclude as before since in this case Af„ — [s, i^/i(a; + e„)]. 
The case [x — en,x[ can be dealt with analogously. 
• Assume that fi{{x}) > 0. Let us consider Nn — — e„,a; + e„]. Then 
IJ,{Nn) — >■ A*({a;}) as e„ I 0. Hence 

/ \7^^{B)-^T,{B)\d^^{t)^^^ [ \MB) - n,iB)\dfi{t) 

+ / |7rt(B)-7r,(B)|dMW- 

The first part of the sum equals and the second part tends to since 
\TTtiB) - 7r^(B)| < 2 and /z(x)/Ai(iV„) ^ 1 as £„ ^ 0. 

□ 

We remark that for tt e 7'(M^), if t/ G spt(7r2;), it is not always true that (x, y) e 
spt(7r). We have introduced 5* in order to obtain this conclusion for x £ S. More 
precisely, we obtain: 



Corollary 8.3. Let S be a set of regular points as in Lemma 8.2 and x £ S . Let 

Bi, . . . ,Bk be a family of pairwise disjoint rational semi-open intervals such that 
T^x{Bj) > for j ^ 1, . . . ,k. 

For every £ > there exists A C Rn [x — e,x + e] such that ^J.{A) > and 
T:t{Bj) > for (j, t) G {1, . . . , fc} x A. Moreover if x is not an atom of fi, then the 
set A can be chosen as a subset of]x, a; + £] (resp. as a subset of [x — £, x[). 

Proof. Let Tr,x,e and the sets Bj be given. Let (£„)„ be a decreasing sequence of 
positive numbers tending to 0. For a every j we have 

hin / \7r.,iBj) - 7rt{B,)\ dfi{t) = 0, 

where is [x — £„,x + £„] or, in the case /i({a;}) = 0, one of the intervals Jx, a; + £„] 
resp. [x — en,x[. This implies 

^l{{t e Nn, \TT,{Bk) - 7rt{Bk)\ > TT,{Bk)/2}) = oifiiNn)). 

Therefore 

^l{{t £ Nn \3j £{!,..., k}, \ttM) - MBk)\ > TT,{Bk)/2}) = o(Ai(iV„)). 

and 

^l{{t e Nn I 3j e {1, . . . , k}, nt{Bk) = O}) = o{fi{Nn)). 

Hence for n sufficiently large the set 

A^{teNn\yj fc}, nt{Bk) > 0} 

has positive measure. For almost all n we also have £„ < £, which concludes the 
proof. □ 
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8.2. Construction of a better competitor when F supports a finite non op- 
timal coupling. Let V be the set of signed measures a with Hahn decomposition 
a = (7+ — such that the fohowing conditions are satisfied: 

• The total mass of a is 0. 

• The marginals of proj^ a and proj^^ cr vanish identically. 

• The measure proj^tdcrl) = proj^ ct^ + proj^ a~ has finite first moment. 

• (T has a disintegration {(Jx)x such that (proj^ |(T|)(a;)-almost surely; the 
positive and the negative parts of have the same mean. 

If only the three first conditions are satisfied a will be an element of V. 

The letter V is reminiscent to the term variation. Indeed observe that if a is a 
positive measure on such that proj^ a has finite first moment and l3 — a — a 
is a positive measure, then /? is a competitor of a in the sense of Definition |1.10| 
Conversely for pair of competitors (a, l3) the measure a — (5 and (5— a are elements of 
V. A notable element of V is {Sx — Sx')(^{XSy+ + {l — X)Sy- — Sxy++{i-xy-)), the kind 
of measure that we have used repeatedly in Sections [6] and [Tj An element of V will be 
called a variation. A variation a is positive (resp. negative) if J c{x, y) (la{x, y) > 
(resp. < 0) 

For a cost function satisfying the sufficient integrability condition, it is not dif- 
ficult to prove that the following statements are equivalent: 

(1) The martingale transport plan a is optimal for the cost c, 

(2) for cr G V such that < a, one has / c{x, y) da(x, y) < 0. 
We can now state the main result of this appendix. 

Theorem 8.4. Assume that fi, v are probability measures in convex order and 
that c : ^ M is o continuous cost function satisfying the sufficient integrability 
condition. Assume thatir g IIm (a*, v) is an optimal martingale transport plan which 
leads to finite costs. Let S be a set of regular points associated to tt and (tt^;)^ 
a disintegration in the sense of Lemma \8.S\ We set 

F = {{x, e I X e S and y G spt(7ra;)}. 

If a is a martingale transport plan such that 

• the support spt{a) of a is finite and 

• the support spt(a) is included in F, 

then the martingale transport plan a is optimal for c between proj^ a and proj^ a. 

Furthermore if a is a measure of finite support in V with spt(cr+) C F, it is a 
non-positive variation. 

Proof. Let a be as in the theorem and assume by contradiction that there exists a 
competitor f3 that leads to smaller costs. We will prove that tt can not be optimal, 
thus establishing the desired contradiction. In other words assume that there is a 
variation cr € V with spt cr"'" C spt a and J c{x,y) da-{x,y) > 0. We will construct 
G V by applying modifications to a so that (t+ < tt and J c{x,y) da{x,y) > 0. 
This yields a contradiction since the competitor n — a is cheaper than tt with respect 
to the cost function c. 



The argument is based on two lemmas and Proposition 8.6 whose proof is post- 
poned to the next subsection. Let us introduce some notations. Assume first that 
spt |(t| is included in {xi, . . . , Xn} x {j/i, . . . , y^} and define for e > the rectangle 
Rij {e) = [x, -e,x,+e]x [y^ - e, yj + e]. 
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Lemma 8.5. There exists e > such that the sets Rijis) are disjoint and any 
measure cr' G V satisfying 

• \a'\ is concentrated on [Ji j Rijis) and 

• for {i,j) e {1, . . . ,n} X {1, . . . ,m} 

\<jiR,,)\~e<\a'iR,j)\ < \a{R,,)\+e, 

is a positive variation. 

Proof. The argument relies on the continuity of c and is straightforward. □ 

Let us call V((t, e) the subset of the measures a' £ V satisfying the conditions of 
the above lemma. We want to find a measure a' G V(ct, e) such that 0-'+ < tt. For 
this purpose we will use the fact that o-"*" is concentrated on T. 

Using the notations of Corollary |8.3[ let Ai be the set A associated to Xi and 
consider an arbitrary family of rational semi-open intervals Bk with yj £ Bj C 
[yj — e,yj +e] and WxiiBj) > Oforeachj. Moreover we take C Rn[xi — e,Xi+e] 
for every i. 

Proposition 8.6. Let e > 0. There are sets Ai,... 
Ai C [xi — e,Xi + e] such that for (ti, . . . , t„) £ Ai x ■ ■ 
Cti,...,t„ £ E satisfying the following: 

• One has crti,...,t„ G V((T, e). 

• The first marginal of \(7tx,....t„\ has support {ti, . 



, An with fJ-{Ai) > and 
■ X An there is a measure 



■ J } ■ 



We postpone the proof of Proposition |8.6| to the next subsection. 
Note that ati^...^tn is not the measure a we are looking for. Nevertheless it 
satisfies al mos t all the conditions. It is in V and even in V(it, e) so that according 
it is a positive variation. The only missing condition it that af^ 



to Lemma 
is not sma. 



8.5 



ler than tt. We provide a remedy in the following lemma: 
Lemma 8.7 (A variation a leading to the contradiction). The measure 
1 



■ ll{An 



0'ii,...,t„ Api{tl) ® ■■■® dfl{tn) 

AiX---xA„ 

is in V(cr, e) and satisfies both JJ c{x,y) da(x,y) > and < it. Hence ir — a 
gives rise to smaller costs than tt. 

Proof. As all crt-^^,,,^t„ in V((To,£) up to a positive multiplicative constants, we 
know that they are positive variations. Hence a, that is an avarage of these measures 
in V is also a positive variation. Let us prove that < tt. Observe that is 
again the average of the positive parts (T^^ ^ 
than 



By Proposition 



.6 



this is smaller 



1 



■l^{Ar. 



AiX- 



Ai(A,)(<5t, ® 7rtJdAi(ii) 



E 

n 

E 

i=l 



( JJJjSu 'g'TTfJd^fti) (g) ■ ■ ■ (g) dfijti) (g) ■ ■ ■ (8) dfijtn) 



((5t, (g TTtJdfl{ti 



X ii{Ai) X 



X ll{An) 




Up to Proposition |8.6| we have thus proved Theorem |8.4| 



□ 



□ 
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8.3. Proof of Proposition [876} Recall the definitions and notations of Theorem 
|8.4| and Proposition |8.6| In particular a has finite support included in F. It is 
also included in {xi, . . . , x {j/i, . . . , y™} where m and n are taken as small as 
possible. For t € V we denote the support of proj^(|r|) by X(t) and the support 
of proj|(|T|) by Y(t) so that {xi, . . . ,x„} = X{a) and {yi, . . . ,ym} = F(ct). Let 
d < n • m be the cardinality of spt((T^) and denote its elements by pi, . . . 

For measures of finite support the conditions for being in V can be simplified. A 
measure t is in V if 

(1) For every y G ^(t), Ly{T) defined as J2xex '''(^' v) zero, 

(2) for every x G X{t), Cx{t) defined as X^yer zero, 

(3) for every x £ X{t), M^{t) defined as '^yfzy '''{^iV) x 2/ is zero. 

Moreover the measure r is an element of V" if the Conditions (1) and (2) are 
satisfied. 

We introduce some further notations. For every t G V" of finite support we 
introduce a relation between the points of X{t). We write x x' if there are 
y,y',y > y' such that T{x,y) and T{x',y') are not zero. If x — > a;' and x' ~^ x 
we write x x' and will say that x double-touches x' . If r G V, for any point 
X £ X{t) an important corollary of Condition (3) is that there exist three distinct 
points y, y' , y" such that T{x,y), T{x,y') and T{x,y") are not zero. Hence a; -O- a; 
if a; G X{t). However the relation o is not transitive. If a; G X double-touches 
both x' and x" we say that x is a bridge over x' and x" . In particular if x -H- a;' 
the point a; is a bridge over x' and x itself. 

Roughly speaking for r G V, the relation a; — >■ a;' means that it is possible to 
replace r (in a continuous manner) by a signed measure t' G V such that t+ 
and r'"*" have the same support. Doing this modification r i— > A/^(t) increases 
while T Mx' (t) decreases (and the sum is a constant function) . More precisely 
consider y, y' , y > y' such that t{x, y) and t[x' , y') are both non zero. Let m be the 
measure {6x — ^x') '?) {Sy — Sy'). Notice that m is an element of V' \ V. Considering 

= T + h ■ m and h > we have 

M,(r'^) - A4(t) = h ■ M,(m) ^ h ■ {y - y') > 0. 

We only consider positive h in order to keep the same support for (t'')+ and r+. 
In particular this prevents that t(x, y') > and t{x' , y) > 0. For the same reason 
we choose h £ [0,/io[ where Hq = max(|r(a;, y)|, |T(a:', y')|). Indeed if T{x,y) < 
then the same applies to T^{x,y). 

If we want to make Mx and Mx' vary in the opposite direction we may consider 
the relation a;' — ?■ a; in place of x — >■ a;'. Thus x x' allows to make small variations 
of Mx and Mx' in the one or the other direction. If there is a bridge x" £ Xir) 
over X and x' we have exactly the same freedom as if x o a;'. The next lemma is 
a tool for finding bridges between points when t G V. 

Lemma 8.8. Let t be a finitely supported element ofV and {x,y) £ X{t) x Y(t) 
such that t(x, y) > 0. Let G C X{t) be the subset of points x' such that 

• there exists a bridge over x and x' , 

• T{x',y) < 0. 

Then 

T{x,y) + ^ T{x\y) < 0. 
x'eG 
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Proof. Condition (1) implies that if every x' £ X{t) satisfying T{x',y) < is 
connected with a; by a bridge, we are done. Conversely assume that there exists 
x' € X{t) such that t{x' ,y) < and there is no bridge between x and x' . Then for 
xq € X{a) the measure \a\ restricted to {a;o} x M is concentrated on {a;o} x [u, +oo[ 
or {zojx] — oo,y] (if not it would be a bridge between x and x'). Let U X^ be 
the partition of X{a) induced by this remark and the restriction of t to X* x M 
for i — 1,2. Without loss of generality we can assume x S X^. Let us prove that 
and are in V. Actually they coincide with r on vertical lines so that they satisfy 
Conditions (1) and (2). The total mass of r on the horizontal lines that are not 
R X {y} is zero too. Thus, as t'(M.'^) = 0, we obtain t'{X' x {y}) = for i = 1, 2. 
This yields Condition (1) for and r^. Hence these measures are in V. 

As e V, applying Condition (1) we obtain that x'^ e X^ such that T{x[,y) < 
is connected with a; by a bridge. Indeed with Condition (2) and the definition of X^, 
we know that there are y' and y" in ]y, +oo[ such that t(x, y') ^ and t{x[, y") ^ 0. 
Hence we have a; O x'^ . So we can apply the first remark to in place of t. Actually 
G is the set of points of xi G X{t^) such that T{xi,y) = T^{xi,y) < 0. □ 

Lemma 8.9. Letr be a finitely supported measure ofV andsptlr^) = {pi, . . . C 
M X M. There exists e > such that if qt G has the same first coordinate as pk 
and \pk ~ qk\ < £ for every k E {!,..., d}, then there exists a sequence {Tk)f in V 
such that |rfc| has finite support and has support {qi, . . . , qk,Pk+i, ■ ■ ■ ,Pd}- 

Proof. Let e be apostive real number. Let us denote by X the support of proj^(|Tfe|) 
for fc e {1, . . . , d} (it will be the same set for any k). We explain how to build 
from Tfe-i. Roughly speaking we are moving pk — (a, b) to a position qk = (a, 6'), 
where \b' — b\ < e. Doing this we have to take care to stay in V. The conditional 
measure j^; can easily be forced to preserve mass zero (Condition (2)) during this 
operation but there are two difficulties: for each y the conditional measures rj, \y 
must have mean zero (Condition (1)). The second problem is that for each x G X 
the positive and the negative part of j^, must have the same mean (Condition 
(3)). 

Let us go into details. We define Tk from Tk-i in two steps: the first step is a 



vertical translation. Applying Lemma 8.8 to pk — (a, b) we obtain a measure m 
concentrated on X{t) x {t;} that satisfies the following conditions: 

• m(M2) = 0, 

• TO"*" is concentrated on the point pk — (a, b) and m(a, b) — Tk-i{a, b), 

• is concentrated on a set G x {6} such that a; e G is connected with a 
by a bridge and < Tf^_i. 

Let us denote mhy ((^ 6b. We replace Tk-i by t^_;^ = Tk-i + C ® i^f — Sb). Doing 
this we preserve Conditions (1) and (2), i.e. the measure is still in V, but Condition 
(3) is possibly violated. Recall that ^ has mass zero. It follows that 

Using the bridges between a and the elements of G (these bridges are available for 
'T'k-i as they were for Tk-i assuming that e is sufficiently small) we can modify the 
measure and make Ma and for x G G equal to 0. Call Tk the result of this 
procedure. Observe that if the variations are sufficiently small then the points of 
positive mass are exactly gi, . . . , qk,Pk+i, ■ ■ ■ tPcL as we want. □ 
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We can now prove Proposition |8.6[ Let ct G V of finite support as in the proof 
of Theorem 18.41 Observe that a can be written as a sum 

d 

k=l 

where for k e {1, . . . ,d} the signed measure Ck has its positive part concentrated 
in one point. Given fc, let be a probabihty measure on M with expectation yk 
(the same as 6y^). We consider 



Ck «) ujk 



fe=i 



and can easily convince ourselves that this measure is an element of V. 

The proof of the proposition proceeds as follows. Consider the family of points 



(ri, . . . ,rd) of the support of and pick e as in Lemma 8.5 For each point 



Tfe — (a, b) we consider a rational semi-open interval Bk 3 of diameter smaller 



than e. Using Corollary 8.3 we obtain a family (^i)i<i<„ and we can assume that 
these sets are included in [xi — e, Xi+e]. We fix a point (ti, . . . , i„) of Ai x • • • x 
For each fc G {1, . . . , d} we can write in the form {xi, b). We have tti- (Bk) > 0. 
Let now pk = {ti,yj) and qk = {ti,y) where y = fg^ /g^ 2/ dTr*, (y). Apply 
Lemma |8.9| to the measure aQ ^ V obtained from a by translating horizontally the 
mass concentrated on the line {xi} x K: the measure a equals precisely (Jq |(. . 
The other parameters (pi, . . . ,pd) and (gi, . . . , qd) have just been given. Applying 
Proposition 8.6 we obtain a measure <7d £ V{a, e) concentrated on {ti, . . . , t„} x M. 



Next we perform the transformation explained above where each cok has the form 
iB^) '^ti l-Bfe ^'^^ some {i,k). The measure Wd we obtain is in V((J, e) but it may 
not satisfy the condition < J^^^i f^{^i)^ti ® T^tf However it holds for wWd'^ if 
w is a sufficiently small positive constant. 
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